Index and optimize a hydrofabric GeoPackage — optimize

Make a written subset cheap to query and re-subset. Three things, borrowed from the verify_indices() / R-tree handling in CIROH's NGIAB_data_preprocess:

Usage

optimize_gpkg(gpkg, extra_cols = character(), verbose = FALSE)

Arguments

gpkg: Path to a GeoPackage written by hfsubset().
extra_cols: Additional column names to index when present.
verbose: Logical; report indices created and R-trees rebuilt.

Value

The gpkg path, invisibly.

Details

Attribute indices on the id / foreign-key columns of every table (e.g. flowpath_id, divide_id, vpuid). sf writes the spatial R-tree but leaves attribute tables (network, *-attributes) and non-geometry id columns unindexed, so WHERE flowpath_id IN (...) re-subsetting scans the whole table without these.
Spatial R-tree verification. GDAL builds rtree_<layer>_<geom> at write time (SPATIAL_INDEX=YES). We confirm each feature layer has one and rebuild any that is missing, so the index is present and freshly built (optimal) before anything copies it downstream.
ANALYZE + PRAGMA optimize so SQLite's planner actually uses the new indices and the R-tree statistics are current.