Open Questions & Risks
Open Questions
The following design questions remain unresolved. Each carries potential impact on architecture or behaviour and will be addressed as the project progresses.
Numbers are stable identifiers; resolved questions are removed without renumbering to preserve cross-references.
| # |
Question |
Impact |
Notes |
| 1 |
How deep should agent personality affect output? |
Medium |
Too deep leads to inconsistency; too shallow makes all agents feel the same. Capability-aware prompt profiles provide tier-based personality condensation. |
| 4 |
Should agents be able to create/modify other agents? |
Medium |
For example, a CTO "hires" a developer by creating a new agent config. |
| 6 |
What metrics define "good" agent performance? |
Medium |
The five-pillar evaluation framework provides structure; quality scoring layers add an LLM judge and human override. |
Technical Risks
| Risk |
Severity |
Mitigation |
| Context window exhaustion on complex tasks |
Medium |
Partially mitigated: context budget management tracks fill, injects indicators, and compacts at turn boundaries. Remaining: LLM-based summarization for higher-quality summaries. |
| Cost explosion from agent loops |
High |
Budget hard stops, loop detection, max iterations per task, auto-downgrade at task boundaries. |
| Agent quality degradation with cheap models |
Medium |
Capability-aware prompt profiles adapt prompts to model tier. Quality gates and minimum model requirements per task type. |
| Third-party library breaking changes |
Medium |
Python deps exact-pinned (==), JS deps range-based with lockfiles. Integration tests, abstraction layers, Renovate weekly updates. |
| Memory retrieval quality |
Medium |
Hybrid retrieval (dense + BM25 sparse with RRF fusion) shipped. LMEB-guided embedding selection implemented. The domain fine-tuning orchestrator is wired into boot; trajectory-mode training additionally requires a configured memory backend, otherwise the controllers degrade to HTTP 501. |
| Agent personality inconsistency |
Low |
Strong system prompts, personality presets with condensed/minimal variants. |
| WebSocket scaling |
Low |
In-process channels today. Multi-instance fan-out can ride on the shipped NATS JetStream bus when needed. |
Architecture Risks
| Risk |
Severity |
Mitigation |
| Over-engineering the MVP |
High |
Start with a minimal viable company (3-5 agents), add complexity iteratively. 12 company templates provide tested starting points. |
| Config format becoming unwieldy |
Medium |
Good defaults, layered config (base + overrides), validation via Pydantic v2 models, setup wizard for guided configuration. |
| Agent execution bottlenecks |
Medium |
Async execution, parallel agent processing, queue-based architecture. TaskGroup for structured concurrency. |
| Data loss on crash |
Medium |
WAL mode SQLite, checkpoint recovery, backup/restore with scheduled retention. |
| Orchestration overhead exceeds productive work |
Medium |
LLM call analytics with proxy metrics implemented. Call categorisation and orchestration ratio alerts planned. |
| SQLite contention under concurrent access |
Low |
Single-writer with WAL mode handles read concurrency well. The PostgreSQL backend (conformance-tested for parity) handles write-heavy and multi-instance workloads. |