Skip to content

Human Interaction Layer

This guide covers how humans and external systems interact with SynthOrg: the API-first architecture, the endpoint surface, per-operation rate limiting, RFC 9457 structured error responses, Web UI features, and the role-based access model.


API-First Architecture

The REST/WebSocket API is the primary interface for all consumers. The Web UI and CLI are thin clients that call the API -- they contain no business logic.

SynthOrg Engine(Core Logic, Agent Orchestration, Tasks)REST/WS API(Litestar)Web UI(React)CLI Tool(Go)

CLI Tool

Cross-platform Go binary (cli/) for Docker lifecycle management. Commands: init (interactive setup wizard), start, stop, status, logs, update (CLI self-update from GitHub Releases with automatic re-exec, channel-aware (stable/dev), compose template refresh with diff approval, container image update with version matching), doctor (diagnostics + bug report URL), uninstall, version, config, completion-install, backup (create/list/restore via backend API), wipe (factory-reset with interactive backup and restart prompts), cleanup (remove old container images to free disk space). Built with Cobra + charm.land/huh/v2 + charm.land/lipgloss/v2. Distributed via GoReleaser + install scripts (curl | sh for Linux/macOS, irm | iex for Windows). Global output modes: --quiet (errors only), --verbose/-v (verbose/trace), --plain (ASCII-only), --json (machine-readable), --no-color, --yes (non-interactive). Typed exit codes: 0 (success), 1 (runtime), 2 (usage), 3 (unhealthy), 4 (unreachable), 10 (update available). Key flags have corresponding SYNTHORG_* or standard env vars.

API Surface

Endpoint Purpose
/api/v1/health Health check, readiness
/api/v1/metrics Prometheus metrics scrape endpoint (unauthenticated). 12 metric families: synthorg_app_info (Info -- version), synthorg_active_agents_total (Gauge -- status, trust_level labels), synthorg_tasks_total (Gauge -- status, agent labels), synthorg_cost_total (Gauge), synthorg_budget_used_percent (Gauge), synthorg_budget_monthly_cost (Gauge), synthorg_budget_daily_used_percent (Gauge -- daily cost as % of prorated daily budget), synthorg_agent_cost_total (Gauge -- agent_id label, per-agent accumulated cost), synthorg_agent_budget_used_percent (Gauge -- agent_id label, per-agent daily cost as % of daily limit), synthorg_coordination_efficiency (Gauge -- push-updated), synthorg_coordination_overhead_percent (Gauge -- push-updated), synthorg_security_evaluations_total (Counter -- verdict label). Most refreshed per-scrape; coordination and security metrics are push-updated.
/api/v1/auth Authentication: setup, login (HttpOnly cookie sessions, CSRF double-submit), password change (rotates session cookie), ws-ticket, session management (list/revoke, concurrent session limits), logout, account lockout, refresh token rotation (three-tier rate limiting -- see docs/security.md)
/api/v1/company CRUD company config
/api/v1/agents List, hire, fire, modify agents
GET /api/v1/agents/{name}/health Per-agent composite health (performance, trust, lifecycle status)
GET /api/v1/agents/{name}/performance Agent performance metrics summary
GET /api/v1/agents/{name}/activity Paginated agent activity timeline (lifecycle, task, cost, tool, delegation events); degraded_sources included in PaginatedResponse contract
GET /api/v1/agents/{name}/history Agent career history events
GET /api/v1/activities Org-wide activity feed
/api/v1/departments Department management
/api/v1/projects Project listing, creation, and retrieval
/api/v1/tasks Task management
POST /api/v1/tasks/{task_id}/coordinate Trigger multi-agent coordination
/api/v1/messages Communication log
/api/v1/meetings Schedule, view meeting outputs
/api/v1/artifacts Artifact listing, creation, retrieval, deletion with binary content upload/download
/api/v1/budget Spending, limits, projections
/api/v1/approvals Pending human approvals queue
/api/v1/analytics GET /overview (metrics summary), GET /trends?period=7d\|30d\|90d (bucketed time-series), GET /forecast?horizon_days=1..90 (budget projection)
POST /api/v1/reports/generate, GET /api/v1/reports/periods On-demand report generation (spending, performance, task completion, risk trends)
/api/v1/settings Runtime-editable configuration (17 namespaces), schema discovery
GET /api/v1/settings/security/export, POST /api/v1/settings/security/import Security policy export/import
GET /api/v1/security/audit Audit log query with filters
GET /api/v1/coordination/metrics Coordination metrics query (9 Kim et al. metrics)
/api/v1/providers/* Provider CRUD, presets, model discovery, discovery SSRF allowlist, local model management (pull with SSE progress, delete, per-model config)
/api/v1/setup/* First-run setup wizard
/api/v1/personalities/* Personality preset discovery and custom preset CRUD
/api/v1/users CEO-only user CRUD
/api/v1/admin/backups Manual backup, list, detail, delete
/api/v1/ws WebSocket for real-time updates. First-message auth preferred: connect without query params, then send {"action":"auth","ticket":"<ticket>"}. Query-param ?ticket= is a legacy fallback.
POST /api/v1/auth/ws-ticket Exchange JWT for one-time WebSocket connection ticket
/api/v1/conflicts/escalations Human escalation approval queue

Per-Operation Rate Limiting

In addition to the global three-tier limiter applied in api/app.py (10,000 req/min/IP floor, 20 req/min unauth by IP, 6,000 req/min auth by user ID -- see docs/security.md for the exact behaviour), a pluggable sliding-window limiter throttles individual expensive or abuse-prone operations. Guards are declared at the route level via per_op_rate_limit("<op>", max_requests=N, window_seconds=W, key=...) from synthorg/api/rate_limits/guard.py.

  • Backend: SlidingWindowStore protocol with an in-memory implementation. Configuration lives in PerOpRateLimitConfig under api.per_op_rate_limit with an overrides: {op -> (max_requests, window_seconds)} map that takes effect without restart. Setting either component to 0 disables the operation; negative values are rejected at startup with a logged error for diagnosability.
  • Keying: user, ip, or user_or_ip (default). The ip source is the proxy-normalised trusted_client_ip populated on the ASGI scope; the raw X-Forwarded-For header is NOT trusted.
  • Denials raise PerOperationRateLimitError (error_code=5001, error_category=rate_limit, retryable=True). Because PerOperationRateLimitError subclasses ApiError, responses are shaped by handle_api_error(), which emits a Retry-After header whenever the exception supplies retry_after; the header value agrees with the envelope's retry_after field. Missing wiring is treated as a deployment error and fails closed with a 429 rather than silently skipping.
  • Throttled endpoints (initial set):
  • POST /api/v1/auth/ws-ticket (20/60s by user)
  • PUT /api/v1/artifacts/{id}/content (10/60s by user)
  • POST /api/v1/admin/backups/restore (3/3600s by user)
  • POST /api/v1/setup/complete (5/3600s by user_or_ip)
  • POST /api/v1/training/{agent}/execute (20/3600s by user)
  • POST /api/v1/simulations (30/3600s by user)
  • PUT/DELETE /api/v1/settings/{namespace}/{key} (60/60s by user)
  • POST /api/v1/a2a/* (120/60s by user_or_ip)
  • GET /api/v1/oauth/callback (30/60s by ip)
  • POST /api/v1/conflicts/escalations/{id}/decision (30/60s by user)

Domain Error Handler Registration

A single handle_domain_error handler (registered in api/exception_handlers.EXCEPTION_HANDLERS) maps eight domain error base classes to RFC 9457 responses: EngineError, BudgetExhaustedError, MixedCurrencyAggregationError, ProviderError, OntologyError, CommunicationError, IntegrationError, ToolError. Each base carries status_code / error_code / error_category / retryable / default_message as ClassVar metadata so MRO dispatch picks up subclass overrides automatically. 5xx responses return the class-level default_message; 4xx pass the controller-authored message through (user-safe). A single is_retryable=True on a subclass is sufficient to surface retryable: true in the envelope even when the base class opts out -- the handler prefers the instance flag when set.

Error Response Format (RFC 9457)

All error responses follow RFC 9457 (Problem Details for HTTP APIs). The API supports two response formats via content negotiation:

  • Default (application/json): ApiResponse envelope with error_detail object
  • RFC 9457 bare (application/problem+json): Flat ProblemDetail body with Content-Type: application/problem+json

Clients request bare RFC 9457 responses by sending Accept: application/problem+json.

ErrorDetail Fields (Envelope Format)

The error_detail object in the envelope contains:

Field Type Description
detail str Human-readable occurrence-specific explanation
error_code int Machine-readable 4-digit code (category-grouped: 1xxx=auth, 2xxx=validation, 3xxx=not_found, 4xxx=conflict, 5xxx=rate_limit, 6xxx=budget_exhausted, 7xxx=provider_error, 8xxx=internal)
error_category str High-level category: auth, validation, not_found, conflict, rate_limit, budget_exhausted, provider_error, internal
retryable bool Whether the client should retry the request
retry_after int \| null Seconds to wait before retrying (null when not applicable)
instance str Request correlation ID for log tracing
title str Static per-category title (e.g., "Authentication Error")
type str Documentation URI for the error category (e.g., https://synthorg.io/docs/errors#auth)

ProblemDetail Fields (RFC 9457 Bare Format)

When Accept: application/problem+json, the response body contains:

Field Type Description
type str Documentation URI for the error category
title str Static per-category title
status int HTTP status code
detail str Human-readable occurrence-specific explanation
instance str Request correlation ID for log tracing
error_code int Machine-readable 4-digit error code
error_category str High-level error category
retryable bool Whether the client should retry
retry_after int \| null Seconds to wait before retrying

Agent consumers can use retryable and retry_after for autonomous retry logic, error_code / error_category for programmatic error handling without parsing message strings, and type URIs for documentation lookup.

See the Error Reference for the full error taxonomy, code list, and retry guidance.

Web UI Features

Status

The Web UI is built as a React 19 + shadcn/ui + Tailwind CSS dashboard. The API remains fully self-sufficient for all operations -- the dashboard is a thin client.

For the full-page list, navigation hierarchy, URL routing map, and WebSocket channel subscriptions, see Page Structure & IA.

Primary navigation (sidebar, always visible):

  • Dashboard (/): Org overview -- department health indicators, recent activity widget, budget snapshot, active task summary, agent status counts, approval badge count
  • Org Chart (/org): Living org visualization with hierarchy and communication graph views, real-time agent status, drag-drop agent reassignment. Merged with former Company page -- "Edit Organization" mode (/org/edit) provides form-based company config CRUD with sub-tabs (General, Agents, Departments)
  • Task Board (/tasks): Kanban (default) and list view toggle. Task detail includes "Coordinate" action for multi-agent coordination
  • Budget (/budget): P&L management dashboard -- current spend vs budget, per-agent/department breakdowns, trend lines, forecast projections (/budget/forecast)
  • Approvals (/approvals): Pending decisions queue with risk-level badges, approve/reject with comment, history view

Secondary navigation (sidebar, collapsible "Workspace" section):

  • Agents (/agents): Agent profile cards/table. Click navigates to Agent Detail page (/agents/{agentName})
  • Messages (/messages): Channel-filtered agent-to-agent communication feed
  • Meetings (/meetings): Meeting history, transcripts, outcomes. Trigger meeting action
  • Providers (/providers): LLM provider CRUD, connection test, preset-based creation, model auto-discovery, model pull dialog with SSE streaming progress, model deletion, per-model launch parameter configuration drawer
  • Settings (/settings): Configuration for UI-visible namespaces (api, memory, budget, security, coordination, observability, backup). Namespace tab bar navigation with single-column layout, basic/advanced mode, GUI/Code edit toggle. Observability sinks sub-page (/settings/observability/sinks) for log sink management.

Settings details:

  • DB-backed persistence: 17 namespaces total. User-facing: api, company, providers, memory, budget, security, coordination, observability, backup. Bridged / operator-only: engine, communication, a2a, integrations, meta, notifications, tools, settings. Setting types: STRING, INTEGER, FLOAT, BOOLEAN, ENUM, JSON. 4-layer resolution: DB > env > YAML > code defaults. Fernet encryption for sensitive values.
  • ConfigResolver: Typed scalar accessors assemble full Pydantic config models from individually resolved settings (parallel via asyncio.TaskGroup). Structural data accessors (get_agents, get_departments, get_provider_configs) resolve JSON-typed settings with Pydantic schema validation and graceful fallback.
  • Hot-reload: SettingsChangeDispatcher polls the #settings bus channel and routes change notifications to registered SettingsSubscriber implementations. Settings marked restart_required=True are filtered.

Human Roles

Role Access Description
Board Member Read-only + approve/reject Strategic oversight; can view all resources and decide on pending approvals, but cannot create or modify resources
CEO Full authority, user management Human IS the CEO, agents are the team. Sole authority to create, modify, and delete user accounts
Manager Department-level authority Manages one team/department directly
Observer Read-only Watch the company operate, no intervention
Pair Programmer Direct collaboration with one agent Work alongside a specific agent in real-time
System Write (backup/wipe only) Internal CLI-to-backend identity. Cannot log in, be deleted, or be modified. Scoped to backup/restore/wipe endpoints only. Bootstrapped at startup.

See Also