ADR-0013: Layering and dependency-inversion rework¶
Status¶
Accepted. Implemented in issue #2343 (the foundational child of the #2341 audit cluster).
Context¶
The #2341 audit found the dependency graph and several boundaries had drifted in ways that the existing gates did not catch:
- Upward leaks. Lower layers (
tools,engine,meta,security,integrations) reached up intoapi.*for shared validators and utilities (parse_typed, the auth-token-size helpers, the sliding-window rate limiter, retry-after parsing). Importing a leaf therefore dragged anapisubtree in. - A stringly-typed MCP boundary. Roughly 45 MCP handlers re-validated their
already-validated
argumentsdict by hand (arguments.get(...),require_non_blank(...), ad-hoc_coerce_statushelpers), even though the invoker had already parsed the payload against the tool'sargs_model. The by-hand re-reads defeated mypy-strict narrowing and let the wire shape and the model drift apart silently. - Controllers crossing the persistence boundary. Five controllers imported
persistence._shared/sqlite/ constraint-token strings / a persistence DTO directly, so persistence internals leaked into the HTTP layer. - Dead config and scattered env reads. Two live coordination caps were never
registered as settings, and config env vars were read directly outside the
bootstrap edge -- including a NATS default split-brain between the worker
argparsedefault and the registry default. - Two bloated utility files mixing unrelated concerns.
Decision¶
1. MCP typed-args contract + enforcement gate¶
Every handler whose tool wires an args_model= narrows the raw arguments
dict exactly once, at the top of its body, via the shared
typed_args(arguments, XxxArgs) helper (promoted out of handlers/research.py
into handlers/_mcp_handler_common.py), then reads typed fields. The invoker
(meta/mcp/invoker.py) has already validated and dumped the dict, so the
re-build is a no-op that restores narrowing. The five cockpit tools that lacked
a model got one, so coverage is 100%.
scripts/check_handler_arguments_get.py (pre-push) is the enforcement gate: it
resolves each args_model-wired tool to its handler def across the
*_HANDLERS maps and fails if the body still reaches into arguments (or the
deleted common_args coercers) anywhere other than the single typed_args
call. The per-line opt-out # lint-allow: handler-arguments-get -- <reason> is
reserved for handlers cataloged as genuine model-versus-handler contract
mismatches; those mismatches are tracked in a separate follow-up issue rather
than papered over here.
2. Core-leaf relocations + tightened import-linter contracts¶
Cross-layer-shared validators and utilities live in light core.* leaves (and
the core.resilience package), never in api, applying the ADR-0012
leaf-placement rule. parse_typed (core/boundary.py), the auth-token-size
trio (core/auth/token_size.py), the sliding-window event limiter
(core/resilience/sliding_window.py), the token-estimation and slug helpers
(core/text_estimation.py, core/slug.py), and the retry-after validator
(core/resilience/retry_after.py) all moved as whole renames -- every importer
repointed in the same change, no re-export shim. core may import
observability (the blessed logger seam), so structured logging survives the
move.
The declarative .importlinter contracts gained targeted forbidden edges (not
a blanket *->api): business-logic packages may no longer import the moved
api modules, and api.controllers may not import persistence._shared /
sqlite / postgres / constraint-token / jsonb-capability internals
(protocol-level persistence imports stay legal). The new leaves are pinned in
the test_cold_import.py smoke test.
3. Controller-to-service boundary + DTOs¶
The five boundary-crossing controllers now call a service or registry method, and persistence types stay behind the boundary:
subworkflowsdrains via a newSubworkflowRegistry.search_all/find_parents_all(the engine layer owns thecollect_allcall).budget_forecastgoes through a newBudgetForecastService(budget/forecast_service.py) wired ontoBudgetStateSlice.cockpitreturns aFlightRecorderFrameResponseDTO (api/dto.py) instead of leaking the persistenceFlightRecorderFrame.auditcalls capability-awarequery_jsonb_*methods on theAuditRepositoryprotocol; the SQLite impl raises the typedJsonbQueryUnsupportedError(mapped to 422 via an explicit_HANDLER_ENTRIESentry, sincePersistenceErrorwould otherwise map it to 500 by MRO).UserServicecatchesConstraintViolationErrorand raises typedDuplicate*/SingleCeo/LastCeo/LastOwnerconstraint errors, so the controllers never see raw constraint-token strings andconstraint_tokensstays private to persistence and services.
4. Live coordination caps + configuration-precedence reaffirmation¶
max_stall_count and max_reset_count are registered settings
(settings/definitions/coordination.py), resolved in
get_coordination_config(), and forwarded through section_config, so they
fall under DB > env > code default and the no-hardcoded-values gate. (Building
the MagenticReplanHook from these caps is part of the pluggable-seam wiring
deferred to the follow-up issue; the caps themselves are now live config.)
Config env reads are confined to the bootstrap edge. The worker NATS default
now resolves through resolve_init_value (closing the split-brain with the
registry default), the worker and app_helpers env double-reads collapse to a
single resolver call (the resolver gained an env_present flag so a strict
caller can tell "env set but invalid" from "env unset" without re-reading), and
the OAuth PKCE cipher became an OAuthPKCECipher class that validates its
Fernet key eagerly at integration wiring instead of lazily mid-flight.
scripts/check_no_os_environ_outside_bootstrap.py (pre-push) is the
enforcement gate: it flags single-key os.environ reads
(get/pop/subscript/getenv) outside an explicit bootstrap / entry-point /
dynamic-secret-backend / Cat-3 construction allowlist, and deliberately ignores
whole-environment snapshots (os.environ.copy(), {**os.environ}) used to
build a child-process environment.
5. Utility-file splits and _sync renames¶
api/controllers/setup/agent_helpers.py (over the code tier cap) split into
_status_checks.py, _runtime_wiring.py, and _embedder_setup.py by concern.
web/src/utils/constants.ts split its WS, task, ceremony, settings, and
workflow clusters into co-located owners (the WS wire-protocol contract moved to
ws-constants.ts, with the check_ws_protocol_version_in_sync gate path
updated to match). The two *_sync method names (record_sync,
set_coordinator_factory_sync) were renamed to describe what they do
(append, register_coordinator_factory).
Consequences¶
- The remaining lower-to-
apiedges are only the intentional dependency-injection / feature-registration seams; the targeted contracts keep new business-logic upward leaks out. - The MCP boundary is mypy-strict-narrowed end to end, and the gate stops a new
args_model-wired handler from re-introducing by-hand re-reads. - Persistence internals no longer surface in controllers, enforced by the
controllers-no-persistence-internalscontract. - Two new pre-push gates ship with the conventions they enforce, per the Convention Rollout rule.
Out of scope (deferred to the follow-up issue under #2341)¶
The audit also found a set of complete-but-unwired pluggable seams (REWORK #12).
Wiring them turned out to be materially larger and riskier than a layering
change: the coordination middleware chain is not built in production at all
(so wiring replan_strategy means standing the whole pipeline up live and
sourcing a BudgetEnforcer), and the embedding ConflictDetector needs a new
TextEmbedder backend. To keep #2343 a clean, low-risk layering foundation the
sibling reworks build on, all seam wiring -- orchestrator_strategy,
replan_strategy + the live middleware chain, SignalsService,
ConflictDetector EMBEDDING / TextEmbedder, CompositeSelector,
PassthroughMemoryFilter, ClientReviewStage, EscalationChain,
max_delegation_rounds, QualityErosionDetector, and the scaling triggers --
moves to a dedicated pluggable-completion issue where the engine behaviour
changes get focused review and test-driven coverage. The MCP args-model
contract mismatches surfaced in section 1 are tracked there as well.