Testbed Workflow Architecture#
[workflow].mode = "testbed" is an orchestration workflow for method robustness
studies. It deliberately sits above domain workflows such as mesh generation or
flow simulation.
The key design rule is:
testbed owns cases and evidence
child runners own domain execution
This keeps method experiments reproducible without turning the testbed package into another simulation engine.
Fig. 427 The testbed layer expands cases into generated child configs, delegates execution, then collects evidence artifacts.#
Package Boundary#
The implementation is intentionally narrow:
Module |
Responsibility |
|---|---|
|
Validate the |
|
Load the base TOML, materialize generated child TOMLs, delegate child execution, extract metrics, and persist evidence files. |
|
Expose |
Child runner packages |
Keep ownership of mesh generation, simulation execution, solver persistence, and future transport execution. |
Supported Runtime Pairs#
The current contract accepts only explicit subject/runner pairs:
Subject |
Runner |
Generated workflow |
Purpose |
|---|---|---|---|
|
|
|
Resolution ladders, constraint sensitivity, conformity checks. |
|
|
|
Parameter sensitivity, boundary-condition cases, solver-option robustness. |
|
|
|
Pairwise comparison campaigns and method-comparison subsets. |
|
|
|
Transport parameter sensitivity and method robustness. |
|
|
|
Pairwise transport-method comparison campaigns. |
Generated children never contain [testbed]. They are ordinary child
workflow TOMLs that can be opened, inspected, and in many cases run directly.
Data Flow#
One testbed run follows this sequence:
Load the testbed TOML.
Load
testbed.base_configwhen present.Remove
[testbed]from the child payload.Resolve path-like values to stable paths.
Merge each case overlay.
Write one child TOML under
<output_root>/_generated_configs/.Persist a dry evidence set.
If
execute = true, run each child sequentially.Rewrite cases, metrics, manifest, and report after each child outcome.
This means execute = false is not a no-op. It is a planning mode that
materializes the experiment and lets a user audit generated children before
spending solver time.
Evidence Model#
Fig. 428 The output directory is designed for auditability: child configs first, status and metrics next, manifest and report last.#
The evidence files have stable roles:
testbed_plan.json: planned cases and generated config paths;testbed_cases.csv: status and artifacts per case;testbed_metrics.csv: configured metrics, or flattened numeric summaries;testbed_manifest.json: machine-readable whole-run contract;testbed_report.md: compact human summary.
Flow Metric Extraction#
For subject = "flow", the testbed does not parse solver files directly.
It delegates to [workflow].mode = "simulation", then reopens the completed run
through the result catalog. The catalog summary is flattened into
flow_metrics keys that can be referenced by [[testbed.metric]].
Confirmed metric examples from the repository starter are:
flow_metrics.duration_s;flow_metrics.n_cells;flow_metrics.param_K;flow_metrics.max_abs_mass_balance_percent_error;flow_metrics.head_range_m;flow_metrics.budget_chd_total_outfor prescribed-head exchanges;flow_metrics.budget_rcha_total_infor recharge.
The full flow_k_sensitivity matrix was executed locally with three MODFLOW
6 children. All three completed successfully with 547 cells and zero
mass-balance percent error in the extracted catalog metrics. The observed
head_range_m decreased from the low-K case to the high-K case, which is
the expected direction for this controlled hydraulic-conductivity sensitivity
test.
Comparison children are deliberately thinner: the testbed consumes the summary
returned by the comparison runner and can expose those fields through
[[testbed.metric]]. The comparison workflow keeps ownership of its HTML,
metrics, figures, and child simulation details.
Case |
|
|
|
|
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Extension Point#
Adding a new subject such as transport should follow the existing contract instead of introducing a one-off runner convention.
The minimum changes are:
Add the subject name to
SUPPORTED_SUBJECTS.Add allowed runner pairs to
SUPPORTED_SUBJECT_RUNNERS.Add runner-to-child-workflow mapping in
RUNNER_WORKFLOWS.Add one launcher branch in
TestbedLauncher._run_case.Add metric extraction only through runner summaries or persisted result stores.
Add dry-plan tests, execution tests with fake runners, and one documented example.
The main invariant should stay intact: testbed remains an evidence layer, not a physics layer.
Failure Semantics#
Metric declarations can set required = true. A missing required metric
turns the child outcome into an explicit failure and writes that error into
the manifest. With continue_on_error = false, the launcher re-raises after
persisting the failed case.
This is useful for robustness studies because silent metric loss is worse than a failed case: a matrix is only comparable if each declared evidence column has the intended meaning across cases.