S1: Multi-Agent Architecture Decision¶
Issue: #1254 (CRITICAL, blocks #1250 / #1251 / #1253) Sources: arXiv:2603.27771 (Multi-Agent Risks), arXiv:2603.26993 (Reliability Limits), arXiv:2604.02460 (Single-Agent Outperforms). Prior baseline: Kim et al. 2025 (arXiv:2512.08296), Multi-Agent Failure Audit (#690), Task & Workflow Engine §Task Decomposability, Communication Coordination §Multi-Agent Failure Pattern Guardrails.
Bottom line¶
SynthOrg keeps multi-agent as a foundational capability but treats it as topology-per-task, not topology-per-company, with single-agent as the default for all task types where multi-agent cannot demonstrate a per-task justification. The three S1 papers confirm this direction; they do not overturn it. Kim et al. 2025 (already integrated in the engine design) set up the heuristic; papers 2 and 3 formalize the math and empirics behind it; paper 1 supplies the emergent-risk catalog the existing guardrails do not yet fully cover.
The critical new work S1 surfaces is encoding the 15 emergent-risk mitigations (especially authority-deference) because SynthOrg's default conflict resolver is literally authority + dissent_log, which is the exact structural shape paper 1 shows produces 10/10 deterministic errors under an authority cue.
Section 1. Decision Matrix: Centralized vs Distributed¶
The existing CoordinationTopology selector in src/synthorg/engine/routing/topology_selector.py already implements most of this matrix. The papers support refining it:
| Task property | Topology | Justification |
|---|---|---|
sequential + any size |
SAS (single-agent) | Kim 2025: -39% to -70% multi-agent effect. Paper 3: Data Processing Inequality. Coordination tokens displace reasoning tokens under equal budget. No change needed. |
parallel + structured + common-evidence regime |
Centralized (orchestrator + sub-agents, orchestrator synthesizes) | Paper 2 formal theorem: delegated networks are decision-theoretically dominated by a centralized Bayes decision maker under common-evidence. Lowest error amplification (4.4x, Kim 2025). |
parallel + exploratory / high-entropy / novel per-agent information sources |
Decentralized (peer debate) | Paper 2 boundary case: distributed CAN outperform when agents access non-shared information. Paper 3 boundary case: diverse specialized knowledge, error-checking via independent reasoning paths, asymmetric agent expertise. |
mixed (sequential backbone + parallel sub-phases) |
Context-dependent | Kim 2025; per-phase selection already implemented as ContextDependentDispatcher. |
| Low-stakes / single-file / simple-complexity | SAS regardless of structure | Paper 3 + existing AutoLoopConfig rule (simple → ReAct). |
| High-stakes / production-consequence / adversarial-input | Centralized + verification stages | Not a topology decision; a gate decision. Ties to R2 (verification stages). See Section 5. |
New constraint from Section 3 (risks): any topology that routes through a chain of agents where one carries an authority marker MUST activate the AuthorityDeferenceGuard mitigation path before downstream agents synthesize.
Section 2. Team-Size Bounds¶
Existing company templates span 1→50+ agents. The empirical literature (Kim 2025, 3-4 agent cap per coordination group) applies to per-task team size, not company size. These must be kept distinct.
| Scope | Bound | Source | Constraint |
|---|---|---|---|
Per-coordination-group (agents working on a single coordination_topology wave) |
3-4 active agents (recommended) | Kim 2025 180-experiment cap; per-agent reasoning degrades sharply beyond this | Soft cap: CoordinationConfig.max_concurrency_per_wave, current settings-registry default 5 (range 1-50, None in the Pydantic model = unlimited). Adopting 3-4 as the recommended default is a follow-up change tracked on R1 (#1250). Legitimate 5-6 sub-agent decompositions exist in the +57% to +81% parallel regime. |
| Per-task total team (including orchestrator + verifiers) | ~7 agents | Kim 2025 hybrid overhead; paper 1 coalition-formation risk rises with team size | Soft cap: logged warning above threshold. |
| Per-company / org size | No hard bound | Organizational simulation value (Enterprise Org 20-50+ template) is NOT the same as per-task reasoning efficiency | No constraint: templates must make clear that a 50-agent Enterprise Org does NOT run 50-agent coordination waves. |
| Per-meeting participants | 3-5 ideal, 8 hard cap | Existing round_robin "small groups (3-5 agents)" note + token cost quadratic growth warning |
Confirmed; no change. |
Section 3. Risk Mitigation Register (15 emergent risks from paper 1)¶
For each risk: coverage status, SynthOrg design location, and action.
| # | Risk | Coverage | Location | Action |
|---|---|---|---|---|
| 1.1 | Tacit collusion | Gap (low priority) | - | LATER: mechanism-level anti-collusion only relevant for negotiation/client-simulation templates (v0.8+). |
| 1.2 | Priority monopolization | Partial | budget/coordination_config.py, task priority field |
Current priority is manual/role-based; no fee/rotation mechanism. LATER: relevant only when multiple clients compete for shared agent pool. |
| 1.3 | Competitive task avoidance | Partial | TaskAssignmentStrategy (6 strategies) |
Manual / hierarchical hard-bind is a mitigation by design. AuctionAssignmentStrategy is vulnerable; document risk in docstring. |
| 1.4 | Strategic information withholding | Gap (low priority) | Message parts have no integrity proofs |
LATER: only material for adversarial A2A federation. Ties to R4. |
| 1.5 | Information asymmetry exploitation | Gap (low priority) | - | LATER: only material for negotiation templates. |
| 2.1 | Majority sway bias | Gap (HIGH PRIORITY) | PositionPapersProtocol synthesizer, StructuredPhasesProtocol |
NEW WORK: EvidenceWeightedSynthesizer, weighting by evidence density, capping correlated-source clusters, preserving minority-report positions. 6/10 fake-news misclassification rate in the paper. Ties to R2. |
| 2.2 | Authority deference | STRUCTURAL (HIGHEST PRIORITY) | AuthorityResolver is the DEFAULT resolver. HybridResolver uses it as fallback. |
NEW WORK (load-bearing): AuthorityDeferenceGuard with blind-aggregation mode (strip authority markers from transcripts), mandatory-justification mode (downstream agents must justify overriding evidence), telemetry events when cascades exceed 2 downstream agents. Ties to R1 + R2 + R4. |
| 3.1 | Non-convergence without arbitrator | Covered | ConflictResolver protocol with arbitrator escalation |
No change. |
| 3.2 | Over-adherence to initial instructions | Partial | PlanExecuteLoop.max_replans, HybridLoop.allow_replan_on_completion, StagnationDetector |
NEW WORK (small): AssumptionViolationSignal adjacent to TerminationReason, propagated loop → engine → coordinator as escalation, not retry. Ties to R1 + R2. |
| 3.3 | Architecturally induced clarification failure | Gap (medium priority) | DecompositionService fans out without a front-end ambiguity gate |
NEW WORK: pre-decomposition specificity check on acceptance_criteria. Ties to intake engine and R2. |
| 3.4 | Role allocation failure | Covered | Hard role binding, explicit assigned_to, merge_id disambiguation |
No change. |
| 3.5 | Role stability under incentive pressure | Covered (by absence) | No inter-agent monetary incentives; roles static per-task | Document explicitly that introducing shared rewards reopens this risk. |
| 4.1 | Competitive resource overreach | Covered | BudgetEnforcer hard caps |
No change. |
| 4.2 | Steganography | Gap (low priority) | - | LATER: only if agents face oversight-evasion incentives. |
| 4.3 | Semantic drift in sequential handoffs | Partial | DelegationGuard prevents loops; sanitize_message() redacts paths |
NEW WORK (small): content hash of original task formulation on TurnRecord / TaskExecution.delegation_chain. Low effort. Ties to R4. |
Summary: - 5 risks fully covered by existing design (3.1, 3.4, 3.5, 4.1, and partial/by-design coverage). - 3 risks partially covered (1.3, 3.2, 4.3); small additions needed. - 2 HIGH-PRIORITY structural gaps (2.1 majority sway, 2.2 authority deference). - 5 LATER / low-priority (1.1, 1.2, 1.4, 1.5, 4.2); all tied to adversarial or negotiation contexts not in MVP scope. - 1 medium-priority (3.3); clarification gate before decomposition.
Section 4. Value-Proposition Reconciliation¶
Paper 3 challenges multi-agent's value claim by showing single-agent matches or beats it on multi-hop reasoning under equal token budgets. If SynthOrg's value proposition were "more agents = better reasoning", the paper would be a direct refutation. It is not. SynthOrg's value proposition is:
- Role specialization as work-stream parallelism, not reasoning parallelism. An engineer writing code while a PM writes the spec while a QA writes tests is not competing for reasoning tokens on the same multi-hop question; it is three concurrent workstreams. Paper 3's equal-budget comparison does not apply because the budgets are not pooled on a single task.
- Organizational simulation fidelity. A synthetic "company" of one single-agent is not a company. The framework exists to simulate org dynamics (department budgets, hiring, performance tracking, meeting cadences, approval chains) that are inherently multi-entity. Paper 2's formal theorem about decision-theoretic dominance applies to delegated decision networks solving a single decision, not to organizations running many concurrent workflows.
- File-level parallel execution via git worktrees.
WorkspaceIsolationStrategy.planner_worktreesenables true filesystem parallelism that a single agent cannot achieve without serializing edits. This is orthogonal to reasoning efficiency; it is execution-throughput efficiency. - Persistent institutional memory across role boundaries.
OrgMemoryBackend,DissentRecord,DecisionRepositoryaccumulate knowledge that is structured by role. A single-agent cannot produce "engineering decided X over QA's objection" as a queryable audit artifact. - Audit-grade decision trails with role attribution.
ReviewGateService+DecisionRecord+charter_versionidentity versioning produce multi-party accountability that is meaningless in a single-agent system. - Per-task topology auto-selection as a first-class primitive. SynthOrg's position is not "multi-agent everywhere"; it is "choose the right topology per task". Papers 2 and 3 are citation-worthy backing for this choice, not critiques of it.
What SynthOrg should NOT claim: that multi-agent reasoning beats single-agent reasoning on multi-hop questions under equal token budgets. That claim is now refuted.
What SynthOrg SHOULD claim: that for work that decomposes into parallel role-specialized streams with shared institutional memory, multi-agent organizations produce outputs a single-agent cannot: namely parallel execution throughput, role-attributed decision artifacts, and simulation fidelity for org dynamics.
Section 5. Impact on R1 / R2 / R4¶
R1 (#1250, harness architecture) inherits:
- Per-coordination-group team size defaults to 3-4 agents with explicit override.
MultiAgentCoordinatormust expose a pluggable point for theAuthorityDeferenceGuardbetween dispatcher result synthesis and parent-task update, not inside individual agent loops.AssumptionViolationSignalpropagation from loop → engine → parent coordinator as an escalation event, not a retry.- The brain/hands/session decoupling R1 designs must preserve role attribution in all delegation frames;
DelegationChaincannot be flattened.
R2 (#1251, verification stages): IMPLEMENTED via #1262. Inherits:
- Deliberation-stage synthesis hook as a first-class stage hosting
AuthorityDeferenceGuard+EvidenceWeightedSynthesizer. - High-stakes task classes require a centralized verification stage even if the task was executed decentralized (paper 2 theorem applies to decisions, not executions).
- Pre-decomposition clarification gate runs before
DecompositionService, not after. R2's stage ordering must allow pre-decomposition stages.
R4 (#1253, inter-agent comms) inherits:
- Prefer broadcast / direct addressing over relay chains where topology permits. Where relay is structurally required, integrity hashing of the original task formulation is the mitigation.
DissentRecordmust become a first-class message type on the bus, not just a persistence artifact. IMPLEMENTED via #1263 (MessageType.DISSENT+ SSEsynthorg:dissentevent).- Authority cues in message metadata must be strippable per-subscriber, not global.
Implementation status (updated via #1260):
AuthorityDeferenceGuard: IMPLEMENTED as agent middleware (before_agent) + coordination middleware (before_update_parent) inengine/middleware/s1_constraints.py.AssumptionViolationMiddleware: IMPLEMENTED as agent middleware (after_model) inengine/middleware/s1_constraints.py.- Pre-decomposition clarification gate: IMPLEMENTED as coordination middleware (
before_decompose) inengine/middleware/s1_constraints.py. - Delegation-chain content hash: IMPLEMENTED as agent middleware (
before_agent) inengine/middleware/s1_constraints.py. EvidenceWeightedSynthesizer: not yet implemented (unblocked; R2 verification stages landed in #1262).
Section 6. DESIGN_SPEC impact¶
The following edits have been applied:
docs/design/index.md: disclaimer under "What This Is NOT" clarifying SynthOrg is not a reasoning parallelizer.docs/design/engine.md§Task Decomposability: updated research-basis callout citing papers 2 + 3 alongside Kim 2025; new "Coordination Group Size Bounds" subsection documenting the 3-4 per-wave default. New "Harness Middleware Layer" section documenting the middleware protocols, default chains, and configuration.docs/design/communication-coordination.md§Conflict Resolution Protocol: warning box under Strategy 1: Authority + Dissent Log citing risk 2.2 (100% deterministic error mode) and referencingAuthorityDeferenceGuard(now implemented as middleware).docs/design/communication-coordination.md§Meeting Protocol: risk notes under each protocol and pointer to the plannedEvidenceWeightedSynthesizer.docs/design/communication.md§Multi-Agent Failure Pattern Guardrails: cross-reference to this decision document and the 15-risk register.docs/design/organization.mdCompany Types table: footnote distinguishing company size from per-task coordination-group size.docs/research/multi-agent-failure-audit.md: appendix enumerating the 15-risk taxonomy with coverage table.
The S1 mitigation hooks (AuthorityDeferenceGuard, AssumptionViolationMiddleware, pre-decomposition clarification gate, content-hash drift detection) are implemented in #1260 as engine middleware. EvidenceWeightedSynthesizer is now unblocked; R2 verification stages landed in #1262.