Make a written subset cheap to query and re-subset. Three things, borrowed
from the verify_indices() / R-tree handling in CIROH's
NGIAB_data_preprocess:
Usage
optimize_gpkg(gpkg, extra_cols = character(), verbose = FALSE)Arguments
- gpkg
Path to a GeoPackage written by
hfsubset().- extra_cols
Additional column names to index when present.
- verbose
Logical; report indices created and R-trees rebuilt.
Details
Attribute indices on the id / foreign-key columns of every table (e.g.
flowpath_id,divide_id,vpuid).sfwrites the spatial R-tree but leaves attribute tables (network,*-attributes) and non-geometry id columns unindexed, soWHERE flowpath_id IN (...)re-subsetting scans the whole table without these.Spatial R-tree verification. GDAL builds
rtree_<layer>_<geom>at write time (SPATIAL_INDEX=YES). We confirm each feature layer has one and rebuild any that is missing, so the index is present and freshly built (optimal) before anything copies it downstream.ANALYZE+PRAGMA optimizeso SQLite's planner actually uses the new indices and the R-tree statistics are current.
