The catalog door#
HydroModPy stores its tabular state in three DuckDB files:
<workspace>/data/cache.duckdb– shared input cache.<project>/catalog.duckdb– simulation results for one project.<state_dir>/index.duckdb– machine-wide federation of every registered workspace.
End-user code never needs to know which file holds a given row.
hmp.open is the single door onto the simulation catalog; the input
cache and the machine-wide federation are reached through their own
entry points (hydromodpy.catalog.InputsNamespace / the hmp data
CLI, and hmp.index()).
Opening a catalog#
import hydromodpy as hmp
cat = hmp.open("~/proj/naizin")
sims = cat.find(solver="modflow6") # SimulationGroup
workspaces = hmp.index() # machine-wide federation
hmp.open returns a SimulationCatalog
(the engine itself, not a wrapper). With the default create=False it
raises FileNotFoundError when no catalog.duckdb exists; pass
create=True to initialise an empty catalog instead.
The three databases#
The simulation catalog – hmp.open#
Backed by <project>/catalog.duckdb.
cat = hmp.open("~/proj/naizin")
# All simulations for this project as a DataFrame.
df = cat.frame
# Equality filters against ``v_simulation_summary`` columns.
# An unknown filter raises ValueError listing the valid keys.
group = cat.find(solver="modflow6", status="completed")
# Schema discovery.
cat.describe()
cat.tables()
cat.columns()
cat.variables()
cat.metrics()
cat.stations()
# Ranking and resolution.
cat.latest()
cat.best()
cat.worst()
cat.rank()
cat.resolve(ref)
# One sim by reference.
row = cat["ab12cd34-...-...-...-..."]
data = cat.read(ref, "head")
# Raw SQL.
cat.sql("SELECT * FROM v_simulation_summary LIMIT 5")
The input cache#
Backed by <workspace>/data/cache.duckdb. Reached through
hydromodpy.catalog.InputsNamespace or the hmp data CLI, not
through hmp.open.
from hydromodpy.catalog import InputsNamespace
inputs = InputsNamespace("~/proj/naizin")
inputs.has_cache()
inputs.db_path # -> ``<workspace>/data/cache.duckdb``
# List entries, optionally filtered.
inputs.list(variable="recharge")
inputs.list(variable="head", source="brgm")
# Locate a single cached entry covering a given extent.
entry = inputs.find(
variable="recharge",
source="meteofrance",
station_id=None,
bbox=(2.0, 48.0, 3.0, 49.0),
)
The machine global index – hmp.index#
Backed by <state_dir>/index.duckdb. Opened in read-only mode so
concurrent hmp run writers keep their write-lock.
# Every registered workspace plus federated search.
hmp.index()
The federation (federated search across every workspace, full-text
search across descriptions / scientific objectives) lives on the index
returned by hmp.index().
Underlying objects#
Callers that need a finer surface (custom SQL, transaction control, register/unregister) reach the underlying objects directly:
hydromodpy.results.catalog.SimulationCataloghydromodpy.data.registry.DataCatalogDuckDBhydromodpy.core.state.global_index.GlobalIndex
These are the V1 implementations and remain the canonical entry points for low-level work.
Migrations runner#
Each of the three DuckDB files owns a flat migrations/ directory
holding one 0001_initial.sql. They share a single runner under
hydromodpy.core.migrations.runner:
from hydromodpy.core.migrations import apply_migrations
apply_migrations(
db_path="path/to/some.duckdb",
migrations_dir="path/to/migrations/",
component="catalog", # or "data_cache", "index"
)
apply_migrations acquires a <db_path>.lock filelock so
concurrent callers serialise. Already-applied migrations are skipped
based on a checksum recorded in schema_migrations.
Authentication#
The catalog reads hydromodpy.core.auth to resolve the current
operator. V1 ships a permissive default
(LocalAuthBackend) that returns the OS
user and allows every operation. Switching backends happens via the
HMP_AUTH_BACKEND environment variable; no code change is needed in
the catalog layer.
Path types#
Every workspace / cache / state path argument is typed
pathlib.Path | upath.UPath. The runtime accepts local paths and
file:// URIs; any other scheme raises NotImplementedError with
a clear message. The type widening lets callers pass a raw URI today
even though only local URIs are honoured.