Site Selection Workflow#

site_selection prepares a reviewed catalog of candidate catchments before regional-lab or simulation work. It selects or rejects basins, writes auditable criteria, and produces a static HTML review report. It does not expand sites into model recipes and it does not run groundwater solvers.

Use it when the question is still upstream of modeling:

  • which gauged catchments should enter a regional campaign;

  • whether candidate outlets delineate plausible basins from the DEM;

  • which candidates fail blocking criteria such as station distance, record length, known influence, or area rules;

  • which selected sites should be exported into regional_lab_sites.csv.

Minimal structure#

[workflow]
mode = "site_selection"

[site_selection]
selection_id = "bretagne_hydrometry_50_500_small_v1"
output_root = "../outputs/bretagne_hydrometry_50_500_small_v1"

[site_selection.input]
mode = "hydrometry"
region_id = "Bretagne"

[site_selection.strategy]
principle = "observation_led"
profile = "gauged_downstream_station"
primary_observation_type = "flow_station"
candidate_mode = "station_outlets"

[site_selection.territory]
mode = "admin_regions"
country = "FR"
regions = ["Bretagne"]

[site_selection.dem]
source = "data"
request_extent = "outlets"
map_background_extent = "territory"

[hydrometry]
date_start = "2015-01-01"
date_end = "2025-01-01"

[[hydrometry.sources]]
source = "hubeau"
product = "QmnJ"
extent = "study_area"
require_observations = true
max_stations = 7

[data]
types = ["dem"]

[[data.dem.sources]]
source = "ign_geoplateforme_dem"
dataset = "bd-alti"
resolution_m = 25.0
file_format = "ASC"
regions = ["Bretagne"]

The DEM is deliberately declared under [data.dem]. In hydrometry mode, the workflow loads the stations first. With site_selection.dem.request_extent = "outlets", it uses those projected station outlets to bound the DEM request before building flow products, delineating catchments, and handing the spatial artifacts to the selection/reporting layer.

Short-term profiles#

Two profiles are considered operational for the current site-selection work. They keep the configuration readable while leaving room for later multi-criteria extensions.

Profile

Use when

Required shape

area_only

The goal is to find DEM-delineated catchments around a target drainage area, without using observations as selection evidence.

principle = "criteria_crossing", primary_axes = ["area"], and a DEM-driven input such as site_selection.input.mode = "dem_area_light".

gauged_downstream_station

The goal is to select gauged catchments whose outlet is represented by a downstream flow station.

principle = "observation_led", primary_observation_type = "flow_station", and candidate_mode = "station_outlets".

Provider status#

The workflow separates provider loading from site-selection criteria. This is intentional: provider code lives under hydromodpy.data; the site_selection package consumes normalized layers, stations, outlets, and basins.

Theme

Available in site_selection now

Natural next providers

Dams and human influences

No direct ROE or BNPE provider is wired into site_selection yet. The current criterion consumes user-declared vector layers through [[site_selection.criteria.influence.layers]] and can mark major_dam_upstream, major_withdrawal_upstream, or major_regulated_reach. For gauged catchments, Hub’Eau hydrometry station metadata can also be used through the station_influence observation criterion; this is a station-quality filter, not a spatial proof that no upstream dam exists.

ROE for obstacles such as dams and weirs; BNPE or Hub’Eau withdrawals for water-abstraction pressure; local DREAL, DDT, SAGE, EPTB, or operator inventories when they are more complete locally.

Geology

site_selection can intersect configured geology polygon layers and report geology evidence. It does not yet automatically transform [data.geology] provider outputs into a site-selection criterion.

BRGM brgm_1m and brgm_50k already exist in the data package; BDLISA is a relevant later source when the question is hydrogeological units rather than geological formations.

DEM candidate generation

The dem_area_light mode generates area-driven candidate outlets and caps the number of DEM delineations before final selection.

A future network-aware generator could favor hydrologically meaningful outlets, reduce nested or near-duplicate basins, improve coastal behavior, and make large territories faster to review.

For hydrometry-led selections, station influence metadata can be made active without adding a separate dam layer:

[site_selection.criteria.observations.station_influence]
mode = "warning"
source = "hubeau_station_metadata"
unknown_policy = "neutral"

Use mode = "hard_reject" only when the campaign should exclude stations whose hydrometric metadata explicitly reports a local or general hydrologic influence through influence_locale_station or influence_generale_site. Unknown influence metadata is not rejected by default. Comment keyword matches are shown as warnings for review; they are not treated as hard-reject evidence.

Input modes#

site_selection has five explicit input modes. The first three are the stable day-to-day modes; the generated DEM modes are useful for area-driven or network-driven candidate discovery.

Mode

Use when

Main inputs

plan_only

The team needs to review strategy, territory, data needs, and outputs before loading observations or delineating basins.

[site_selection] only.

delineated_catchments

Candidate outlets or pre-normalized catchments already exist as a CSV, often from a fixture, frozen inventory, or previous provider query.

catchments_csv and optionally delineate_from_outlets = true.

hydrometry

Stations should be loaded directly through HydroModPy data managers, usually Hub’Eau hydrometry over a territory or explicit station list.

[hydrometry] plus [data.dem].

dem_area_light

The campaign should discover DEM-derived outlets around a target basin area, with a bounded number of delineations for fast review.

strategy.profile = "area_only", primary_axes = ["area"], [site_selection.dem_area_light] and [data.dem].

generated_candidates

The campaign should sample high-accumulation DEM/network cells before normal delineation and selection. This mode is experimental compared with the two short-term supported profiles.

outlets.candidate_mode = "network_sampling" plus [data.dem] or a custom DEM.

DEM and observations#

The workflow keeps data-provider access outside the spatial selection package:

  • DEM loading goes through hydromodpy.data.variables.dem and should be configured in [data.dem].

  • Hub’Eau station loading goes through the hydrometry data manager.

  • In hydrometry mode, request_extent = "outlets" limits the calculation DEM to the station envelope plus margin_km; the map background can still use a broader territory DEM.

  • French administrative regions are resolved to departments by the data layer.

  • Hub’Eau station coordinates are requested in WGS84 for provider queries, then projected to Lambert-93 for DEM delineation. When Hub’Eau exposes official Lambert-93 station coordinates, those are preferred.

For French regional examples, source = "ign_geoplateforme_dem" with dataset = "bd-alti" and resolution_m = 25 is the current operational default.

Outlet snapping#

Two snapping strategies are available:

Strategy

Behavior

dem_accumulation

Snap the candidate outlet directly to the DEM-derived accumulation raster within snap_dist_m.

bdtopage_then_dem

First project the outlet to BD Topage or a custom reference network, reject it if that reference line is too far, then apply the local DEM snap.

BD Topage is a technical reference for constraining outlet locations. It should not be displayed by default in the site-selection map: on regional DEM backgrounds it can be mistaken for the validated hydrographic network of the selected basins. The report should show the DEM, selected/rejected basins, station points, final outlets, and station-to-outlet displacement links.

Outputs#

Every executed selection run writes the audit core:

  • criteria_components.jsonl;

  • site_selection_decisions.csv;

  • site_selection_decisions.jsonl;

  • site_selection_evidence.jsonl when normalized evidence exists;

  • site_selection_manifest.json.

Additional outputs follow the configured switches:

  • selected_sites.csv when write_csv and write_selected are true;

  • rejected_sites.csv when write_csv and write_rejected are true;

  • regional_lab_sites.csv when write_regional_lab_csv is true;

  • selected/rejected outlet and basin GeoJSON files when write_geojson is true;

  • GeoPackage and GeoParquet layers when their corresponding switches are true.

Generated DEM modes also write candidate_generation.jsonl. When write_geojson is true, they write candidate_outlets.geojson and generated_dem_network.geojson as review artifacts.

When HTML reporting is enabled, the run also writes:

  • review/index.html as the main review entry point, with a per-block detail selector;

  • review/compact/index.html;

  • review/standard/index.html;

  • review/audit/index.html;

  • review/site_selection_map.png.

The manifest is the hand-off contract. The HTML report is derived from the manifest and its declared artifacts, so validation should target the manifest first. compact is a fast plausibility read, standard is the default scientific review, and audit exposes provenance, detailed criteria tables, and artifact links. The main review/index.html page lets each block choose its own level.

Examples#

The example project contains short cases for the two supported short-term profiles, plus broader regional previews:

hmp run examples/projects/17_site_selection_workflow/configs/calvados_dem_area_light_100km2_fast.toml
hmp run examples/projects/17_site_selection_workflow/configs/bretagne_hydrometry_50_500_small.toml
hmp run examples/projects/17_site_selection_workflow/configs/bretagne_hydrometry_50_500_small_bdtopage.toml
hmp run examples/projects/17_site_selection_workflow/configs/auvergne_rhone_alpes_hydrometry_preview.toml
hmp run examples/projects/17_site_selection_workflow/configs/corse_hydrometry_preview.toml

Use the direct DEM snap example to check the normal map layout. Use the BD Topage variant only to inspect outlet-location sensitivity; the reference network remains an internal snapping support.

The two closure examples for the stabilized contract are:

Config

Effective profile

Expected result

Review HTML

calvados_dem_area_light_100km2_fast.toml

area_only

26 candidates, 10 selected, 16 rejected

outputs/calvados_dem_area_light_100km2_fast_v1/review/index.html

bretagne_hydrometry_50_500_small_bdtopage.toml

gauged_downstream_station

6 candidates, 6 selected, 0 rejected

outputs/bretagne_hydrometry_50_500_small_bdtopage_v1/review/index.html

Both paths above are relative to examples/projects/17_site_selection_workflow/. The associated map file is review/site_selection_map.png in each output directory.

Troubleshooting#

  • If selected sites appear on a regular grid, check whether the input is a synthetic area_only fixture rather than real hydrometry stations.

  • If the DEM is absent from the map, verify [data.dem] and site_selection.dem.map_background_extent.

  • If station points and basins are offset, inspect CRS metadata and prefer provider Lambert-93 coordinates when available.

  • If hmp run fails while opening an old cache.duckdb, open the cache with a recent HydroModPy build once; old V1 data-cache tables are adopted into the current schema_migrations ledger.