Skip to content

Open Questions & Risks

Open Questions

The following design questions remain unresolved. Each carries potential impact on architecture or behavior and will be addressed as the project progresses.

Numbers are stable identifiers -- resolved questions are removed without renumbering to preserve cross-references.

# Question Impact Notes
1 How deep should agent personality affect output? Medium Too deep leads to inconsistency; too shallow makes all agents feel the same. Capability-aware prompt profiles (#805) will add tier-based personality condensation.
4 Should agents be able to create/modify other agents? Medium For example, a CTO "hires" a developer by creating a new agent config.
6 What metrics define "good" agent performance? Medium Five-pillar evaluation framework (#1017) provides structure; quality scoring Layers 2+3 (#230) will add LLM judge and human override.
8 Optimal message bus for local-first architecture? Medium asyncio queues for Phase 1; distributed backend (Redis, NATS) planned for Phase 2 (v0.8).

Technical Risks

Risk Severity Mitigation
Context window exhaustion on complex tasks Medium Partially mitigated: context budget management tracks fill, injects indicators, and compacts at turn boundaries. Remaining: LLM-based summarization for higher-quality summaries.
Cost explosion from agent loops High Budget hard stops, loop detection, max iterations per task, auto-downgrade at task boundaries.
Agent quality degradation with cheap models Medium Capability-aware prompt profiles (#805) adapt prompts to model tier. Quality gates and minimum model requirements per task type.
Third-party library breaking changes Medium Python deps exact-pinned (==), JS deps range-based with lockfiles. Integration tests, abstraction layers, Dependabot daily updates.
Memory retrieval quality Medium Hybrid retrieval (dense + BM25 sparse with RRF fusion) shipped. LMEB-guided embedding selection implemented. Domain fine-tuning pipeline not yet implemented -- config and checkpoint lookup wired, training stages raise NotImplementedError (#1001).
Agent personality inconsistency Low Strong system prompts, personality presets with condensed/minimal variants planned (#805).
WebSocket scaling Low In-process channels for Phase 1. Redis pub/sub planned for distributed deployments.

Architecture Risks

Risk Severity Mitigation
Over-engineering the MVP High Start with a minimal viable company (3-5 agents), add complexity iteratively. 9 company templates provide tested starting points.
Config format becoming unwieldy Medium Good defaults, layered config (base + overrides), validation via Pydantic v2 models, setup wizard for guided configuration.
Agent execution bottlenecks Medium Async execution, parallel agent processing, queue-based architecture. TaskGroup for structured concurrency.
Data loss on crash Medium WAL mode SQLite, checkpoint recovery, backup/restore with scheduled retention.
Orchestration overhead exceeds productive work Medium LLM call analytics with proxy metrics implemented. Call categorization and orchestration ratio alerts planned.
SQLite contention under concurrent access Low Single-writer with WAL mode handles read concurrency well. PostgreSQL backend planned for v0.8 for write-heavy workloads.