Open Questions & Risks
Open Questions
The following design questions remain unresolved. Each carries potential impact on architecture or behavior and will be addressed as the project progresses.
Numbers are stable identifiers -- resolved questions are removed without renumbering to preserve cross-references.
| # |
Question |
Impact |
Notes |
| 1 |
How deep should agent personality affect output? |
Medium |
Too deep leads to inconsistency; too shallow makes all agents feel the same. Capability-aware prompt profiles (#805) will add tier-based personality condensation. |
| 4 |
Should agents be able to create/modify other agents? |
Medium |
For example, a CTO "hires" a developer by creating a new agent config. |
| 6 |
What metrics define "good" agent performance? |
Medium |
Five-pillar evaluation framework (#1017) provides structure; quality scoring Layers 2+3 (#230) will add LLM judge and human override. |
| 8 |
Optimal message bus for local-first architecture? |
Medium |
asyncio queues for Phase 1; distributed backend (Redis, NATS) planned for Phase 2 (v0.8). |
Technical Risks
| Risk |
Severity |
Mitigation |
| Context window exhaustion on complex tasks |
Medium |
Partially mitigated: context budget management tracks fill, injects indicators, and compacts at turn boundaries. Remaining: LLM-based summarization for higher-quality summaries. |
| Cost explosion from agent loops |
High |
Budget hard stops, loop detection, max iterations per task, auto-downgrade at task boundaries. |
| Agent quality degradation with cheap models |
Medium |
Capability-aware prompt profiles (#805) adapt prompts to model tier. Quality gates and minimum model requirements per task type. |
| Third-party library breaking changes |
Medium |
Python deps exact-pinned (==), JS deps range-based with lockfiles. Integration tests, abstraction layers, Dependabot daily updates. |
| Memory retrieval quality |
Medium |
Hybrid retrieval (dense + BM25 sparse with RRF fusion) shipped. LMEB-guided embedding selection implemented. Domain fine-tuning pipeline not yet implemented -- config and checkpoint lookup wired, training stages raise NotImplementedError (#1001). |
| Agent personality inconsistency |
Low |
Strong system prompts, personality presets with condensed/minimal variants planned (#805). |
| WebSocket scaling |
Low |
In-process channels for Phase 1. Redis pub/sub planned for distributed deployments. |
Architecture Risks
| Risk |
Severity |
Mitigation |
| Over-engineering the MVP |
High |
Start with a minimal viable company (3-5 agents), add complexity iteratively. 9 company templates provide tested starting points. |
| Config format becoming unwieldy |
Medium |
Good defaults, layered config (base + overrides), validation via Pydantic v2 models, setup wizard for guided configuration. |
| Agent execution bottlenecks |
Medium |
Async execution, parallel agent processing, queue-based architecture. TaskGroup for structured concurrency. |
| Data loss on crash |
Medium |
WAL mode SQLite, checkpoint recovery, backup/restore with scheduled retention. |
| Orchestration overhead exceeds productive work |
Medium |
LLM call analytics with proxy metrics implemented. Call categorization and orchestration ratio alerts planned. |
| SQLite contention under concurrent access |
Low |
Single-writer with WAL mode handles read concurrency well. PostgreSQL backend planned for v0.8 for write-heavy workloads. |