Skip to contents

Runs a set of invariant assertions at a named pipeline stage. Designed to be called between pipeline steps so silent failures (orphan divides, unsplit parents, raster artifacts) halt the run at the step that created the problem – not 200 lines later during visual inspection.

Usage

hf_check_invariants(
  stage,
  ...,
  strict = TRUE,
  coverage = FALSE,
  coverage_min = 0.9,
  attr_bounds = FALSE,
  domain = NULL,
  attr_trust_caveated = FALSE
)

Arguments

stage

Character. One of `"refactored"`, `"reconciled"`, `"aggregated"`, `"ngen"`.

...

Stage-specific named arguments (see details).

strict

Logical. If `TRUE` (default) any failed check throws an error. If `FALSE`, failures become warnings.

coverage

Logical. If `TRUE`, run the expensive flowpath-in-catchment coverage check (aggregated and ngen stages only). Default `FALSE`.

coverage_min

Minimum fraction of a flowpath that must lie inside its assigned catchment. Default `0.90`.

attr_bounds

Logical. If `TRUE`, append a per-attribute physical-range plausibility pass over the `divides` and `flowpaths` layers (see [hf_check_attr_bounds()]). Soft/warn-only and off by default, so it never changes the returned `ok` for existing callers until enabled. Absent attribute columns are skipped.

domain

Domain code (`"CONUS"`, `"AK"`, `"HI"`, `"PRVI"`) selecting the `lat`/`lon` bounds for the attribute pass; `NULL` skips lat/lon.

attr_trust_caveated

Logical. Include attribute bounds flagged in the `caveat` column of the bounds table (default `FALSE`).

Value

Invisibly a list with `ok` (logical), `stage`, and `checks` (named list of per-check results).

Details

Expected arguments per stage:

- `stage = "refactored"`: `refactored` (sf, split NHD flowlines), `reconciled` (sf, collapsed-and-reconciled lines with integer `ID`). - `stage = "reconciled"`: `reconciled` (sf), `divides` (sf of reconciled divides). - `stage = "aggregated"`: `flowpaths` (sf), `divides` (sf), `network` (data.frame, optional). - `stage = "ngen"`: `flowpaths` (sf with `fp-` IDs), `divides` (sf with `cat-` IDs), and optionally `nexus` (data.frame with `nexus_id` / `nexus_toid`, enabling the `fp -> nexus -> fp` DAG check), `flowlines` and/or `network` (a data.frame carrying the `So` channel-slope routing attribute), and `lakes` (sf with a `lake_id` column and waterbody polygons). When the relevant columns are present, the ngen stage adds: `slope_valid` and `So_valid` (channel `slope` / `So` must be strictly positive everywhere they appear, since routing requires it; an all-`NA` column reports as info rather than failing), and `lake_spatial_consistent` (every flowpath stamped with a `lake_id` must lie within ~one lake-extent, 1 km floor, of that lake – catching `wbareacomi` VAA mis-indexing without demanding exact geometry overlap).

Examples

if (FALSE) { # \dontrun{
flowpaths <- sf::read_sf("hydrofabric.gpkg", "flowpaths")
divides   <- sf::read_sf("hydrofabric.gpkg", "divides")
hf_check_invariants("ngen", flowpaths = flowpaths, divides = divides,
  strict = FALSE)
} # }