Data Managers And External Dependencies#

Scope#

The data-manager root layer is HydroModPy’s orchestration boundary for data loading. It answers three questions before solver work begins:

  • which data families are active for this run,

  • which manager loads each family,

  • which external providers or local inputs those managers depend on.

This architecture is intentionally centralized under hydromodpy.data so the project facade and the simulation runner do not need to duplicate activation rules or provider-specific loading logic.

Code map#

  • hydromodpy/data/data_managers_config.py: typed validation of [data] sections.

  • hydromodpy/data/planner.py and plan.py: activation inference and immutable DataLoadPlan creation.

  • hydromodpy/data/runtime_loader.py: runtime dispatch from activated data types to concrete managers.

  • hydromodpy/data/data_managers.py: lightweight loaded-data container consumed by the project facade and the structure binders.

  • hydromodpy/data/variables/*: provider-specific packages that own typed config, manager logic, and IO.

Root-Layer Responsibilities#

The root files under hydromodpy/data/ split responsibilities as follows:

  • data_managers_config.py validates [data] and normalizes typed sections,

  • planner.py merges explicit types with inference rules,

  • plan.py stores the immutable DataLoadPlan,

  • runtime_loader.py dispatches each activated type to its concrete manager,

  • data_managers.py exposes the lightweight runtime container consumed by orchestration layers.

This means the project facade can stay focused on execution order while the data layer owns activation, validation, and loading dispatch.

Activation And Inference#

The activation contract is deterministic:

  • explicit data.types always wins,

  • additional types can be inferred from domain or flow configuration,

  • the final decision is stored in DataLoadPlan with reasons per type.

Current inference rules include:

  • domain.supports using provider geology -> activate geology,

  • domain.zone_ids containing geology -> activate geology,

  • flow.active_bc containing stream -> activate hydrography,

  • flow.active_bc containing ocean -> activate oceanic.

data.inference_mode = "warn" accepts these inferences and records them. data.inference_mode = "strict" requires explicit typed sections for the inferred families, except for the geology defaulting path already handled by the data layer.

Runtime Dispatch#

DataManagersRuntimeLoader is the concrete dispatcher used after planning. Its current dispatch table covers:

  • terrain and support context: dem, geology, hydrography, oceanic,

  • observed stations: hydrometry, piezometry, intermittency, water_quality,

  • climatic or forcing-like fields: recharge, runoff, precipitation, etp, temperature, wind, humidity, radiation, soil_moisture.

Each family still owns its own typed config and manager package. The root layer only decides activation and calls the right loader.

Provider Families#

The current provider inventory documented in hydromodpy/data/structure.md is summarized below.

Provider or source family

Main HydroModPy data families

Geographic scope

Hub’Eau Hydrometrie

hydrometry

France metropolitaine

Hub’Eau Piezometrie

piezometry

France metropolitaine

Hub’Eau ONDE

intermittency

France metropolitaine

Hub’Eau Qualite

water_quality

France metropolitaine

SIM2 EDR

precipitation, etp, temperature, wind, humidity, radiation, soil_moisture, recharge, runoff

France metropolitaine

SHOM

oceanic

French coasts

IGN GeoPlateforme BD ALTI

dem

France metropolitaine

BRGM 1:1M / 1:50K

geology

France metropolitaine

Sandre WFS / BD Topage

hydrography

France metropolitaine

EU-Hydro

hydrography

Europe

OpenStreetMap / Overpass

hydrography

global

External Runtime Constraints#

The important architectural point is not only “which API is called”, but also “what must be available for that call to succeed”.

Typical constraints today are:

  • SIM2-backed variables require a bounding box and a project time window,

  • SHOM loading requires a geographic context and a resolved date range,

  • geology and DEM loading often rely on geographic masks or raster support,

  • hydrography and watershed preprocessing depend on the Whitebox backend for some derived products,

  • local custom sources remain first-class inputs and must not be silently overwritten by cache subsumption logic.

The data catalog and cache layer then add another operational constraint:

  • empty remote results may be cached as sentinels to avoid repeated failed API calls,

  • stored paths stay relative for workspace portability,

  • force_refresh bypasses cache reuse when a provider call must be repeated.

What To Read When Touching This Layer#

Start with:

  • hydromodpy/data/README.md for the root orchestration contract,

  • hydromodpy/data/planner.py for inference rules,

  • hydromodpy/data/runtime_loader.py for the active dispatch surface,

  • hydromodpy/data/structure.md for the broader provider and cache model.

Then inspect one typed family such as:

  • hydromodpy/data/variables/oceanic/,

  • hydromodpy/data/variables/geology/,

  • hydromodpy/data/variables/hydrometry/,

  • hydromodpy/data/variables/precipitation/.

Those packages own provider-specific config, IO, and manager behavior.

Current Boundary With Future Work#

The architecture roadmap still mentions future consolidation work, especially:

  • deeper planner simplification,

  • clearer convergence between some observed-station families,

  • integrating PyHELP as a standard data-manager family rather than a more isolated coupling path.

Until that happens, this page should be read as the current root-layer contract, not as the final long-term provider map.