Artifact Policy#

HydroModPy V1 uses DuckDB, Zarr and Parquet as the canonical stores. Other formats are allowed only when their role is explicit.

Canonical scientific stores#

Category	Formats	Contract
Project catalog	DuckDB	Stable source of truth for simulations, metrics, parameters, provenance, audit events and workflow ledger.
Input cache	DuckDB	Stable cache index for downloaded and custom datasets.
Field arrays	Zarr	Stable persisted field store for gridded and mesh fields.
Tabular outputs	Parquet, GeoParquet	Stable persisted tables and vector layers.

Allowed non-canonical artifacts#

Category	Formats	Contract
User exports	CSV, NetCDF, GeoTIFF, VTU, Shapefile, STAC, PROV-O, RO-Crate, `.hmp`	Documented outputs for humans and external tools. They are not the source of truth.
Input derivatives	Parquet, NetCDF, JSON sidecars	Cache artifacts tied to data variables and provenance.
Solver diagnostics	NPY, NPZ, JSON, CSV	Sidecars for debugging solver state. They are not stable user APIs unless a page names the schema.
Validation and calibration truth packages	NPZ, CSV, JSON	Named benchmark artifacts with local schema expectations.
Legacy hydrology helpers	HDF5	Localized PyHELP output. Stable only for the owning helper.
Observability	JSONL, DuckDB	Experimental `validity_frame` sidecars.

Direct DuckDB access#

The normal path is a catalog or cache adapter. Direct duckdb.connect is allowed in these narrow cases:

migration runners;
backend adapters and concrete catalog/cache constructors;
read-only diagnostics;
portable .hmp snapshot export/import;
tests and performance benchmarks;
experimental validity_frame ingestion;
developer-only CLI inspection commands.

New direct DuckDB calls in user-facing CLI commands or hydromodpy._api should be treated as a regression unless they are added to this list with a test rationale.