Vector Footprints

getVectorFootprint.sh, a Bash script, will allow you to get a bounding box geometry for every vector file and each layer in that file. The resulting format is CSV, with columns: path, layer, crs, wkt_geometry. With this, you can convert the CSV to any vector spatial format. Bash, GDAL and jq required.

Script usage is as follows:

./getVectorFootprint.sh --help
Usage:
./getVectorFootprint.sh
   --dir       : directory to search, default current directory
   --formats   : extensions to search for, default .gpkg, .shp and .fgb 
                 any vector format extension GDAL understands is valid.
   --mindepth  : min directory depth, default current directory
   --maxdepth  : max directory depth, default no limit 
   --help      : this output
Example: ./getVectorFootprint.sh --dir="/export/data" --formats="gpkg"

An example that searches for every GeoPackage in and under the specified directory might look like:

./getVectorFootprint.sh --dir="/export/gis/data/geonames" --formats=gpkg > res.csv

The resulting CSV looks like, res.csv:

path,layer,epsg,wkt_geometry
"/export/gis/data/geonames/Gazetteer_National_GPKG.gpkg","CellGrid_7_5Minute",4269,"POLYGON((-179.166667 -77.583333,-179.166667 71.500000,180.000000 71.500000,180.000000 -77.583333,-179.166667 -77.583333))"
"/export/gis/data/geonames/Gazetteer_National_GPKG.gpkg","DomesticNames",4269,"POLYGON((-179.468056 -85.374233,-179.468056 71.417452,179.983333 71.417452,179.983333 -85.374233,-179.468056 -85.374233))"
"/export/gis/data/geonames/Gazetteer_National_GPKG.gpkg","FeatureDescriptionHistory",4269,"POLYGON((-179.468056 -85.374233,-179.468056 71.417452,179.983333 71.417452,179.983333 -85.374233,-179.468056 -85.374233))"
"/export/gis/data/geonames/Gazetteer_National_GPKG.gpkg","FederalCodes",4269,"POLYGON((-179.094159 -14.541614,-179.094159 71.403334,179.833925 71.403334,179.833925 -14.541614,-179.094159 -14.541614))"
"/export/gis/data/geonames/Gazetteer_National_GPKG.gpkg","Gaz_Features",4269,"POLYGON((-179.468056 -85.374233,-179.468056 71.417452,179.983333 71.417452,179.983333 -85.374233,-179.468056 -85.374233))"
"/export/gis/data/geonames/Gazetteer_National_GPKG.gpkg","GovernmentUnits",4269,"POLYGON((-177.361389 -85.374233,-177.361389 69.720001,166.650278 69.720001,166.650278 -85.374233,-177.361389 -85.374233))"
"/export/gis/data/geonames/Gazetteer_National_GPKG.gpkg","HistoricalFeatures",4269,"POLYGON((-178.036667 -14.300278,-178.036667 71.359722,173.600000 71.359722,173.600000 -14.300278,-178.036667 -14.300278))"
"/export/gis/data/geonames/Gazetteer_National_GPKG.gpkg","PopulatedPlaces",4269,"POLYGON((-178.036667 -14.361111,-178.036667 71.385556,178.877500 71.385556,178.877500 -14.361111,-178.036667 -14.361111))"

./getVectorFootprint.sh has done its job. You now have the file path, layer, epsg and well known text geometry bounding box in .csv format. We can convert that to any format we wish. Note, epsg or CRS is important and we want to create a file with the same CRS. So, we'll use EPSG:4269 from the above file, resulting in a nice clean .gpkg of our data with CRS 4269:

epsg=4269
gdal vector pipeline \
   ! read -i res.csv --oo QUOTED_FIELDS_AS_STRING=YES --oo GEOM_POSSIBLE_NAMES=wkt_geometry --oo KEEP_GEOM_COLUMNS=NO \
   ! sql --sql "select * from res where epsg = '${epsg}'" \
   ! reproject --src-crs EPSG:${epsg} --dst-crs EPSG:${epsg} \
   ! write -o res.gpkg --overwrite --overwrite-layer --output-layer res

Here's the full script:

#!/bin/bash

baseDir="."
mindepth=1
maxdepth=100000
help=0

formats='.gpkg$|.shp$|.fgb$'

while [ "$#" -gt 0 ]; do
  case "$1" in
    --dir=*) baseDir="$(echo -n ${1#*=} |sed 's/[^\/\-\_0-9a-z\.]//gi')"; shift 1;;
    --formats=*) formats="$(echo -n ${1#*=} |sed 's/[^0-9a-z\,]//gi' | sed 's/\,/\$\|\./g')"; shift 1;;
    --mindepth=*) mindepth="$(echo -n ${1#*=} |sed 's/[^0-9]//g')"; shift 1;;
    --maxdepth=*) maxdepth="$(echo -n ${1#*=} |sed 's/[^0-9]//g')"; shift 1;;
    --help) help=1; shift 1;;
    *) shift 1;;
  esac
done
err="Usage:"
[[ ${#formats} -gt 0 ]] && formats=".${formats}\$"
[[ ${#formats} -le 1 ]] && err="'${formats}' --formats is empty" && help=1

if [ $help -eq 1 ] ; then
echo "$err"
cat<<EOF
$0
   --dir       : directory to search, default current directory
   --formats   : extensions to search for, default .gpkg, .shp and .fgb
                 any vector format extension GDAL understands is valid.
   --mindepth  : min directory depth, default current directory
   --maxdepth  : max directory depth, default no limit
   --help      : this output
Example: $0 --dir="/export/data" --formats="gpkg"

EOF
   exit
fi

echo "path,layer,epsg,wkt_geometry"
for path in $(echo -e "$(find $baseDir -mindepth $mindepth -maxdepth $maxdepth |grep -Ei "$formats")" |tr "\n" " ") ; do
   cols="$(gdal vector info $path --of=json 2> /dev/null| jq '.layers[]|[.name,.geometryFields[].coordinateSystem.projjson.id.code,.geometryFields[].extent]|tostring' |sed 's/[^a-z0-9\._,-]//gi'|tr -d '\\'|grep ','|grep -v ',null' |tr "\n" " ")"
   for layer in $cols ; do
      IFS=',' read p lay epsg xmin ymin xmax ymax <<< "$path,$layer"
      printf -v csv '"%s","%s",%s,"POLYGON((%.6f %.6f,%.6f %.6f,%.6f %.6f,%.6f %.6f,%.6f %.6f))"' "$p" "$lay" "$epsg" "$xmin" "$ymin" "$xmin" "$ymax" "$xmax" "$ymax" "$xmax" "$ymin" "$xmin" "$ymin" || continue
      echo "$csv"
   done
done