S1: Multi-Agent Architecture Decision¶

Issue: #1254 (CRITICAL, blocks #1250 / #1251 / #1253) Sources: arXiv:2603.27771 (Multi-Agent Risks), arXiv:2603.26993 (Reliability Limits), arXiv:2604.02460 (Single-Agent Outperforms). Prior baseline: Kim et al. 2025 (arXiv:2512.08296), Multi-Agent Failure Audit (#690), Task & Workflow Engine §Task Decomposability, Communication Coordination §Multi-Agent Failure Pattern Guardrails.

Bottom line¶

SynthOrg keeps multi-agent as a foundational capability but treats it as topology-per-task, not topology-per-company, with single-agent as the default for all task types where multi-agent cannot demonstrate a per-task justification. The three S1 papers confirm this direction; they do not overturn it. Kim et al. 2025 (already integrated in the engine design) set up the heuristic; papers 2 and 3 formalize the math and empirics behind it; paper 1 supplies the emergent-risk catalog the existing guardrails do not yet fully cover.

The critical new work S1 surfaces is encoding the 15 emergent-risk mitigations (especially authority-deference) because SynthOrg's default conflict resolver is literally authority + dissent_log, which is the exact structural shape paper 1 shows produces 10/10 deterministic errors under an authority cue.

Section 1. Decision Matrix: Centralized vs Distributed¶

The existing CoordinationTopology selector in src/synthorg/engine/routing/topology_selector.py already implements most of this matrix. The papers support refining it:

Task property	Topology	Justification
`sequential` + any size	SAS (single-agent)	Kim 2025: -39% to -70% multi-agent effect. Paper 3: Data Processing Inequality. Coordination tokens displace reasoning tokens under equal budget. No change needed.
`parallel` + structured + common-evidence regime	Centralized (orchestrator + sub-agents, orchestrator synthesizes)	Paper 2 formal theorem: delegated networks are decision-theoretically dominated by a centralized Bayes decision maker under common-evidence. Lowest error amplification (4.4x, Kim 2025).
`parallel` + exploratory / high-entropy / novel per-agent information sources	Decentralized (peer debate)	Paper 2 boundary case: distributed CAN outperform when agents access non-shared information. Paper 3 boundary case: diverse specialized knowledge, error-checking via independent reasoning paths, asymmetric agent expertise.
`mixed` (sequential backbone + parallel sub-phases)	Context-dependent	Kim 2025; per-phase selection already implemented as `ContextDependentDispatcher`.
Low-stakes / single-file / simple-complexity	SAS regardless of structure	Paper 3 + existing `AutoLoopConfig` rule (`simple → ReAct`).
High-stakes / production-consequence / adversarial-input	Centralized + verification stages	Not a topology decision; a gate decision. Ties to R2 (verification stages). See Section 5.

New constraint from Section 3 (risks): any topology that routes through a chain of agents where one carries an authority marker MUST activate the AuthorityDeferenceGuard mitigation path before downstream agents synthesize.

Section 2. Team-Size Bounds¶

Existing company templates span 1→50+ agents. The empirical literature (Kim 2025, 3-4 agent cap per coordination group) applies to per-task team size, not company size. These must be kept distinct.

Scope	Bound	Source	Constraint
Per-coordination-group (agents working on a single `coordination_topology` wave)	3-4 active agents (recommended)	Kim 2025 180-experiment cap; per-agent reasoning degrades sharply beyond this	Soft cap: `CoordinationConfig.max_concurrency_per_wave`, current settings-registry default 5 (range 1-50, `None` in the Pydantic model = unlimited). Adopting 3-4 as the recommended default is a follow-up change tracked on R1 (#1250). Legitimate 5-6 sub-agent decompositions exist in the +57% to +81% parallel regime.
Per-task total team (including orchestrator + verifiers)	~7 agents	Kim 2025 hybrid overhead; paper 1 coalition-formation risk rises with team size	Soft cap: logged warning above threshold.
Per-company / org size	No hard bound	Organizational simulation value (Enterprise Org 20-50+ template) is NOT the same as per-task reasoning efficiency	No constraint: templates must make clear that a 50-agent Enterprise Org does NOT run 50-agent coordination waves.
Per-meeting participants	3-5 ideal, 8 hard cap	Existing `round_robin` "small groups (3-5 agents)" note + token cost quadratic growth warning	Confirmed; no change.

Section 3. Risk Mitigation Register (15 emergent risks from paper 1)¶

For each risk: coverage status, SynthOrg design location, and action.

#	Risk	Coverage	Location	Action
1.1	Tacit collusion	Gap (low priority)	-	LATER: mechanism-level anti-collusion only relevant for negotiation/client-simulation templates (v0.8+).
1.2	Priority monopolization	Partial	`budget/coordination_config.py`, task priority field	Current priority is manual/role-based; no fee/rotation mechanism. LATER: relevant only when multiple clients compete for shared agent pool.
1.3	Competitive task avoidance	Partial	`TaskAssignmentStrategy` (6 strategies)	Manual / hierarchical hard-bind is a mitigation by design. `AuctionAssignmentStrategy` is vulnerable; document risk in docstring.
1.4	Strategic information withholding	Gap (low priority)	`Message` parts have no integrity proofs	LATER: only material for adversarial A2A federation. Ties to R4.
1.5	Information asymmetry exploitation	Gap (low priority)	-	LATER: only material for negotiation templates.
2.1	Majority sway bias	Gap (HIGH PRIORITY)	`PositionPapersProtocol` synthesizer, `StructuredPhasesProtocol`	NEW WORK: `EvidenceWeightedSynthesizer`, weighting by evidence density, capping correlated-source clusters, preserving minority-report positions. 6/10 fake-news misclassification rate in the paper. Ties to R2.
2.2	Authority deference	STRUCTURAL (HIGHEST PRIORITY)	`AuthorityResolver` is the DEFAULT resolver. `HybridResolver` uses it as fallback.	NEW WORK (load-bearing): `AuthorityDeferenceGuard` with blind-aggregation mode (strip authority markers from transcripts), mandatory-justification mode (downstream agents must justify overriding evidence), telemetry events when cascades exceed 2 downstream agents. Ties to R1 + R2 + R4.
3.1	Non-convergence without arbitrator	Covered	`ConflictResolver` protocol with arbitrator escalation	No change.
3.2	Over-adherence to initial instructions	Partial	`PlanExecuteLoop.max_replans`, `HybridLoop.allow_replan_on_completion`, `StagnationDetector`	NEW WORK (small): `AssumptionViolationSignal` adjacent to `TerminationReason`, propagated loop → engine → coordinator as escalation, not retry. Ties to R1 + R2.
3.3	Architecturally induced clarification failure	Gap (medium priority)	`DecompositionService` fans out without a front-end ambiguity gate	NEW WORK: pre-decomposition specificity check on `acceptance_criteria`. Ties to intake engine and R2.
3.4	Role allocation failure	Covered	Hard role binding, explicit `assigned_to`, merge_id disambiguation	No change.
3.5	Role stability under incentive pressure	Covered (by absence)	No inter-agent monetary incentives; roles static per-task	Document explicitly that introducing shared rewards reopens this risk.
4.1	Competitive resource overreach	Covered	`BudgetEnforcer` hard caps	No change.
4.2	Steganography	Gap (low priority)	-	LATER: only if agents face oversight-evasion incentives.
4.3	Semantic drift in sequential handoffs	Partial	`DelegationGuard` prevents loops; `sanitize_message()` redacts paths	NEW WORK (small): content hash of original task formulation on `TurnRecord` / `TaskExecution.delegation_chain`. Low effort. Ties to R4.

Summary: - 5 risks fully covered by existing design (3.1, 3.4, 3.5, 4.1, and partial/by-design coverage). - 3 risks partially covered (1.3, 3.2, 4.3); small additions needed. - 2 HIGH-PRIORITY structural gaps (2.1 majority sway, 2.2 authority deference). - 5 LATER / low-priority (1.1, 1.2, 1.4, 1.5, 4.2); all tied to adversarial or negotiation contexts not in MVP scope. - 1 medium-priority (3.3); clarification gate before decomposition.

Section 4. Value-Proposition Reconciliation¶

Paper 3 challenges multi-agent's value claim by showing single-agent matches or beats it on multi-hop reasoning under equal token budgets. If SynthOrg's value proposition were "more agents = better reasoning", the paper would be a direct refutation. It is not. SynthOrg's value proposition is:

Role specialization as work-stream parallelism, not reasoning parallelism. An engineer writing code while a PM writes the spec while a QA writes tests is not competing for reasoning tokens on the same multi-hop question; it is three concurrent workstreams. Paper 3's equal-budget comparison does not apply because the budgets are not pooled on a single task.
Organizational simulation fidelity. A synthetic "company" of one single-agent is not a company. The framework exists to simulate org dynamics (department budgets, hiring, performance tracking, meeting cadences, approval chains) that are inherently multi-entity. Paper 2's formal theorem about decision-theoretic dominance applies to delegated decision networks solving a single decision, not to organizations running many concurrent workflows.
File-level parallel execution via git worktrees. WorkspaceIsolationStrategy.planner_worktrees enables true filesystem parallelism that a single agent cannot achieve without serializing edits. This is orthogonal to reasoning efficiency; it is execution-throughput efficiency.
Persistent institutional memory across role boundaries. OrgMemoryBackend, DissentRecord, DecisionRepository accumulate knowledge that is structured by role. A single-agent cannot produce "engineering decided X over QA's objection" as a queryable audit artifact.
Audit-grade decision trails with role attribution. ReviewGateService + DecisionRecord + charter_version identity versioning produce multi-party accountability that is meaningless in a single-agent system.
Per-task topology auto-selection as a first-class primitive. SynthOrg's position is not "multi-agent everywhere"; it is "choose the right topology per task". Papers 2 and 3 are citation-worthy backing for this choice, not critiques of it.

What SynthOrg should NOT claim: that multi-agent reasoning beats single-agent reasoning on multi-hop questions under equal token budgets. That claim is now refuted.

What SynthOrg SHOULD claim: that for work that decomposes into parallel role-specialized streams with shared institutional memory, multi-agent organizations produce outputs a single-agent cannot: namely parallel execution throughput, role-attributed decision artifacts, and simulation fidelity for org dynamics.

Section 5. Impact on R1 / R2 / R4¶

R1 (#1250, harness architecture) inherits:

Per-coordination-group team size defaults to 3-4 agents with explicit override.
MultiAgentCoordinator must expose a pluggable point for the AuthorityDeferenceGuard between dispatcher result synthesis and parent-task update, not inside individual agent loops.
AssumptionViolationSignal propagation from loop → engine → parent coordinator as an escalation event, not a retry.
The brain/hands/session decoupling R1 designs must preserve role attribution in all delegation frames; DelegationChain cannot be flattened.

R2 (#1251, verification stages): IMPLEMENTED via #1262. Inherits:

Deliberation-stage synthesis hook as a first-class stage hosting AuthorityDeferenceGuard + EvidenceWeightedSynthesizer.
High-stakes task classes require a centralized verification stage even if the task was executed decentralized (paper 2 theorem applies to decisions, not executions).
Pre-decomposition clarification gate runs before DecompositionService, not after. R2's stage ordering must allow pre-decomposition stages.

R4 (#1253, inter-agent comms) inherits:

Prefer broadcast / direct addressing over relay chains where topology permits. Where relay is structurally required, integrity hashing of the original task formulation is the mitigation.
DissentRecord must become a first-class message type on the bus, not just a persistence artifact. IMPLEMENTED via #1263 (MessageType.DISSENT + SSE synthorg:dissent event).
Authority cues in message metadata must be strippable per-subscriber, not global.

Implementation status (updated via #1260):

AuthorityDeferenceGuard: IMPLEMENTED as agent middleware (before_agent) + coordination middleware (before_update_parent) in engine/middleware/s1_constraints.py.
AssumptionViolationMiddleware: IMPLEMENTED as agent middleware (after_model) in engine/middleware/s1_constraints.py.
Pre-decomposition clarification gate: IMPLEMENTED as coordination middleware (before_decompose) in engine/middleware/s1_constraints.py.
Delegation-chain content hash: IMPLEMENTED as agent middleware (before_agent) in engine/middleware/s1_constraints.py.
EvidenceWeightedSynthesizer: not yet implemented (unblocked; R2 verification stages landed in #1262).

Section 6. DESIGN_SPEC impact¶

The following edits have been applied:

docs/design/index.md: disclaimer under "What This Is NOT" clarifying SynthOrg is not a reasoning parallelizer.
docs/design/engine.md §Task Decomposability: updated research-basis callout citing papers 2 + 3 alongside Kim 2025; new "Coordination Group Size Bounds" subsection documenting the 3-4 per-wave default. New "Harness Middleware Layer" section documenting the middleware protocols, default chains, and configuration.
docs/design/communication-coordination.md §Conflict Resolution Protocol: warning box under Strategy 1: Authority + Dissent Log citing risk 2.2 (100% deterministic error mode) and referencing AuthorityDeferenceGuard (now implemented as middleware).
docs/design/communication-coordination.md §Meeting Protocol: risk notes under each protocol and pointer to the planned EvidenceWeightedSynthesizer.
docs/design/communication.md §Multi-Agent Failure Pattern Guardrails: cross-reference to this decision document and the 15-risk register.
docs/design/organization.md Company Types table: footnote distinguishing company size from per-task coordination-group size.
docs/research/multi-agent-failure-audit.md: appendix enumerating the 15-risk taxonomy with coverage table.

The S1 mitigation hooks (AuthorityDeferenceGuard, AssumptionViolationMiddleware, pre-decomposition clarification gate, content-hash drift detection) are implemented in #1260 as engine middleware. EvidenceWeightedSynthesizer is now unblocked; R2 verification stages landed in #1262.