Faster Raster Data Acquisition

img
femafhz map
Double click on map for production example

To the point...
How to retrieve pixel values from 15 different rasters at a given coordinate in 1 line of code.

...and a bit more verbose
At some point you'll want your interactive map users to retrieve raster value(s) at a given map click. This is often done with a web service on a back end server/service. How you do this can be extremely complex or insanely simple. It all depends on the tools you use.

Will you do this with an ArcPy script or AGOL or maybe a python script and all the attendant environment? Think about this for a moment. What does your environment require to retrieve a single pixel value from a raster using ESRI or Python? What resources does the care and feeding of that environment require? I've been there and the answer makes me cringe.

The short answer to my rhetorical question.
I've used gdalbuildvrt, specifying paths to all 15 rasters, to create a virtual raster, femafhz.vrt. Next use Bash and GDAL to get 15 pixel values from 15 different rasters at a single coordinate:

   #!/bin/bash

   gdallocationinfo -geoloc -valonly /opt/gis/vrt/femafhz.vrt -100.123456 40.987654 | \
   awk -v div=15 'NR%div { printf("%s,", $0); next} {print $0}'
The above will return CSV with 15 different values:
   36,1,0,0,1,0,-8.22000026702881,29.8199996948242,589,0,255,255,255,255,255
Great. Wonderful. What about performance? What about performance at scale? Well, let's time it:
   time ./test.sh
   1,0,0,1,0,-8.22000026702881,29.8199996948242,589,0,255,255,255,255,255
   0.098u 0.009s 0:00.10 90.0%	0+0k 0+0io 0pf+0w
It took 1/10 of a second to return 15 raster pixel values. That's reasonable. BUT, what if we have many coordinates. Surely, we'll see scale issues. Let's find out. The VRT file I'm using covers CONUS. I've generated 100,000 random CONUS coordinates sorted by longitude and placed them in 'coords.txt'. Let's modify our script to accomodate these values:
   #!/bin/bash

   gdallocationinfo -geoloc -valonly /opt/gis/vrt/femafhz.vrt < coords.txt | \
   awk -v div=15 'NR%div { printf("%s,", $0); next} {print $0}'
With 100K random coordinates it took 31.78 seconds or 3,146 coords per second. Put another way, at 15 rasters per coordinate, that's 47,199 raster pixel values per second!
   time ./test.sh
   28.953u 2.583s 0:31.78 99.2%	0+0k 0+12936io 0pf+0w
Notes
These tests were performed on a production AWS EC2 small instance with 2GB ram and 2.5Ghz processor. On a quiet server you'll get even better times. Our VRT file contains 15 rasters with pixel resolutions between 10 and 30 meters. The VRT file was created using gdalbuildvrt command. All rasters are local to the server. Trying this with rasters on the network, say an S3 bucket, will slow things down dramatically.

Part of the irony here is ESRI uses GDAL under the hood for format conversions. Any geo-spatial in Python will require you to install GDAL libraries. Why not just use GDAL directly where you can and cut out the middleman?

www.Postholer.com © 2005-2024 - W3C - @postholer