ADR-0012: Break the cold-import cycle¶
Status¶
Accepted. Implemented in issue #2207.
Context¶
synthorg could not be imported starting from an arbitrary leaf module on a
cold interpreter. Both python -c "import synthorg.providers.enums" and
python -c "import synthorg.core.agent" raised a circular-import ImportError
from a fresh process. Every real entry point hid the defect by importing a
heavy package first, which happened to resolve the graph in a working order:
- the app boots through
api.app; - the test suite's root
tests/conftest.pyimportedsynthorg.persistencefirst; evals/__main__.pyimportedsynthorg.persistencebefore its runner.
Any new standalone entry point had to rediscover that magic prime, and the
.importlinter contracts passed only because the working order was never
violated.
Root cause: eager re-export side effects in package inits¶
The cycle was not a set of simple module-to-module edges. It was driven by
eager re-export side effects in package __init__ hubs. Importing any
persistence.* submodule runs persistence/__init__.py; that init eagerly
imported factory -> protocol -> (everything), so a cold
import synthorg.persistence._shared (a leaf datetime helper) dragged in over
a thousand synthorg modules, including config.schema, core.agent,
engine.agent_engine, and providers.management. The same was true of
config, providers, providers.management, budget, and the
conflict_resolution / escalation sub-packages.
A consequence that shaped the regression guard: grimp (and therefore
import-linter) builds its graph from explicit import statements per module.
It cannot see a cycle that exists only because importing a leaf triggers a
parent __init__ whose re-exports pull a subgraph. grimp's chain-finder
returned None for communication.config -> config.schema even though the
runtime cycle ran through it, so a "forbidden indirect" contract reported
KEPT while the cycle was live.
core/__init__.py is the one deliberately light package init (empty by
design). That is why relocating shared types into a core.* (or a new
execution.*) leaf genuinely breaks edges, while relocating into any other
package does not.
The three named keystone cycles¶
config.schema <-> communication.configcore.agent <-> persistence.protocolbudget.coordination_collector <-> engine(still dormant; the #2183 hoist programme applies this same relocation strategy where a module-level forward-ref would otherwise close a cycle -- for exampleworkers/distributed_protocols.py, a cold-safe handle leaf for the separateworkers.config -> communication.configcycle -- but thebudget.coordination_collectoredge itself is not hoisted)
Because #2183 removes the TYPE_CHECKING guards, guarding these edges with
if TYPE_CHECKING: is not a durable fix. The cycles had to be broken
structurally, by relocation.
Decision¶
Cut A: relocate ToolSubConstraints into a core leaf¶
core.agent reached up into the heavy tools hub for ToolSubConstraints.
ToolSubConstraints, its five dimension enums, _LEVEL_SUB_CONSTRAINTS, and
get_sub_constraints moved verbatim from tools/sub_constraints.py to the new
core/tool_constraints.py (its only dependencies are core.* and
observability, all light). The old module was deleted and all importers
updated -- a full rename, no re-export shim.
Cut B: trim the escalation / conflict_resolution package inits, and the linchpin hub inits¶
The config.schema <-> communication.config cycle closed through a back-path
communication.config -> conflict_resolution.config -> escalation/__init__ ->
factory -> in_memory_store -> persistence, which then reached
persistence.protocol -> providers.management -> service -> config.schema. The
fix was to stop the package inits on that path from eagerly importing their
implementation:
conflict_resolution/escalation/__init__.pynow re-exports only the light abstractions (config,models,protocol), notfactory/in_memory_store/notify/processors/registry/sweeper.- The linchpin
persistence.protocol -> providers.management -> config.schemaedge was severed by emptying the eager re-exports ofproviders/management/__init__.py,config/__init__.py,persistence/__init__.py, andbudget/__init__.py, and repointingproviders.management.capability_dtosatconfig.provider_schema(a leaf) instead ofconfig.schema.
Callers now import implementation symbols from their defining submodule. After
this, config.schema imports cold, which is what proves the named cycle is
broken from the config side.
Cut C: a new light leaf package synthorg/execution/¶
ExecutionResult is welded to engine by its context: AgentContext field
and cannot move. So instead of moving it, the types
budget.coordination_collector actually needs were relocated into a new light
leaf package, and the collector was given a structural view to depend on:
execution/turn.py--NodeType,BehaviorTag,TurnRecord(moved out ofengine/loop_protocol.py).TerminationReasonstays inengine/loop_protocol.pyalongsideExecutionResult.execution/efficiency.py--EfficiencyRatios,IdealTrajectoryBaseline(moved out ofengine/trajectory/efficiency_ratios.py).execution/view.py-- a@runtime_checkable ExecutionResultViewprotocol exposing onlyturns: tuple[TurnRecord, ...], the minimal surface the collector reads.execution/parked_context.py--ParkedContext, the serialised parked-agent snapshot (moved out ofsecurity/timeout/). It applies the same leaf-placement rule: the persistence, worker, and API layers name the type without dragging the heavyenginepackage in, breaking an engine<->security edge that theParkServicerelocation toengine/surfaced.
engine.loop_protocol and engine.trajectory now import the moved types from
the leaf (a legitimate downward dependency); ExecutionResult structurally
satisfies ExecutionResultView. budget.coordination_collector imports the
view and TurnRecord from execution.* at module level and never imports
engine, so it stays cold-safe even after #2183 hoists its annotations.
Leaf-placement rule¶
Shared types that would otherwise force a cross-hub edge belong in a light leaf
package -- core.* for foundation types, execution.* for execution-trace
types -- never reached for by importing up into a heavy hub. Hub __init__
files stay light (config / models / protocols) or empty; implementation is
imported from its defining submodule.
Applying this rule, the dependency-free leaves pinned in COLD_IMPORT_LEAVES
now include, beyond the three original cuts: core.completion_enums
(FinishReason, out of the providers hub so budget.cost_record no longer
re-enters it), core.effective_autonomy and core.redteam_review_input
(resolved-autonomy and red-team gate-input value objects out of the heavy
security package), and core.approval. New dependency-free core.* /
execution.* types are added to the same tuple as they appear.
Regression guard¶
The primary guard is a runtime cold-import smoke test,
tests/unit/test_cold_import.py: each leaf in COLD_IMPORT_LEAVES is imported
in its own freshly spawned interpreter via subprocess, the only faithful way
to assert a cold import (within pytest the graph is already primed). A
secondary, documentation-grade backstop extends .importlinter:
core-is-foundation now also forbids core reaching up into tools,
providers, communication, budget, and config; a new
execution-is-a-leaf contract forbids execution reaching up into engine,
api, workers, or meta. The smoke test exists precisely because these
contracts cannot see init-side-effect cycles.
Consequences¶
- The two acceptance leaves and
evalsimport cold; theimport synthorg.persistenceprime inevals/__main__.pywas removed. - The forward chain
providers -> budget -> security -> engine -> communication -> persistencestays heavy but is now acyclic. Heaviness is not the bug; the cycle was. We did not lazify the hubs (rejected: it fights mypy-strict, the explicit__all__contracts, andengine's registry side effects). - Emptying the hub inits means importing an arbitrary leaf no longer drags the
rest of the graph in a known-good order. The test process therefore keeps an
explicit graph prime, repointed from the now-empty
persistenceinit toimport synthorg.api.appintests/conftest.py(the real entry point, which loads every hub in a proven order). This primes the test process only; production leaf entry points are genuinely cold-safe, which the subprocess-isolated smoke test verifies. Only theevalsprime was removed; the test-conftest prime is preserved by design (the issue scopes its removal toevals).
Scope notes¶
This change is broader than the three cuts in isolation, and intentionally so:
- Hub inits leaned:
providers/management,config,persistence,budget, and theconflict_resolution/escalationinit, in addition to the three relocations. Each was load-bearing for the acceptance cold imports. engine/__init__deliberately left heavy.enginebuilds dispatch registries at import of the modules it eagerly pulls (execution_loop,conflict_detector, etc.). Making that lazy risks "registry empty" runtime bugs that an import smoke test would not catch. The leaf relocation (Cut C) removes the need to touchengine's init.
Out of scope¶
- The
communication <-> enginecycle (viacommunication.meeting._prompts -> engine.prompt_safety) is a separate, unnamed cycle.communication.configconsequently does not yet import cold and is intentionally absent fromCOLD_IMPORT_LEAVES. Breaking it would require leaningengine's init (see above).config.schemaimporting cold already discharges the namedconfig.schema <-> communication.configacceptance cycle. - Broad hub lazification (PEP 562
__getattr__across the heavy package inits) remains a larger refactor out of scope here. Targeted single-module lazification is the documented exception:synthorg.approval.__init__lazily exportsApprovalStoreProtocol(mirroringsynthorg.ontology) so importing theapproval.enums/approval.modelsleaves does not eagerly pullapproval.protocol -> core.approvaland close that package-init cycle.