Observability & Performance Tracking¶
Observability in SynthOrg spans three concerns: performance tracking (quality scoring weights, LLM judge, trend detection), structured logging (11 default sinks with per-domain routing, correlation IDs, sensitive-field redaction), and metrics export (Prometheus /metrics + OTLP). All three are configured through the same settings subsystem and refresh without restart where safe.
Performance Tracking Configuration¶
The performance namespace in the company YAML configures the performance tracking
subsystem, including quality scoring weights, LLM judge settings, and trend detection
thresholds. These values flow through RootConfig.performance into
_build_performance_tracker at app startup.
performance:
min_data_points: 5 # Minimum data points for meaningful aggregation
windows:
- "7d"
- "30d"
- "90d"
improving_threshold: 0.05 # Slope threshold for improving trend
declining_threshold: -0.05 # Slope threshold for declining trend
quality_judge_model: null # Model ID for LLM quality judge (null = disabled)
quality_judge_provider: null # Provider name (null = auto from first available)
quality_ci_weight: 0.4 # Weight for CI signal in composite score
quality_llm_weight: 0.6 # Weight for LLM judge in composite score
llm_sampling_rate: 0.01 # Fraction of events sampled by LLM calibration
llm_sampling_model: null # Model for calibration sampling (null = disabled)
collaboration_weights: null # Custom weights for collaboration scoring (null = defaults)
calibration_retention_days: 90 # Days to retain calibration records
| Field | Type | Default | Description |
|---|---|---|---|
quality_judge_model |
string or null |
null |
Model ID for quality LLM judge. null disables the judge. |
quality_judge_provider |
string or null |
null |
Provider name for the judge. Requires quality_judge_model. |
quality_ci_weight |
float |
0.4 |
Weight for CI signal (0.0--1.0). Must sum to 1.0 with quality_llm_weight. |
quality_llm_weight |
float |
0.6 |
Weight for LLM judge (0.0--1.0). Must sum to 1.0 with quality_ci_weight. |
min_data_points |
int |
5 |
Minimum data points for meaningful metric aggregation. |
windows |
list[string] |
["7d", "30d", "90d"] |
Time window labels for rolling metrics (at least one required). |
improving_threshold |
float |
0.05 |
Slope above which a metric trend is classified as "improving". |
declining_threshold |
float |
-0.05 |
Slope below which a metric trend is classified as "declining". |
collaboration_weights |
object or null |
null |
Custom weights for collaboration scoring components. null uses defaults. |
llm_sampling_rate |
float |
0.01 |
Fraction of task events sampled for LLM calibration. |
llm_sampling_model |
string or null |
null |
Model ID for LLM calibration sampling. null disables sampling. |
calibration_retention_days |
int |
90 |
Days to retain calibration records before expiry. |
Validation Rules
quality_ci_weight + quality_llm_weightmust equal1.0(tolerance: 1e-6)improving_thresholdmust be strictly greater thandeclining_thresholdquality_judge_providerrequiresquality_judge_modelto be set
Structured Logging¶
Structured logging pipeline built on structlog + stdlib, with automatic sensitive field redaction, async-safe correlation tracking, and per-domain log routing.
Sink Layout¶
Eleven default sinks, activated at startup via bootstrap_logging():
| Sink | Type | Level | Format | Routes | Description |
|---|---|---|---|---|---|
| Console | stderr | INFO | Colored text | All loggers | Human-readable development output |
synthorg.log |
File | INFO | JSON | All loggers | Main application log (catch-all) |
audit.log |
File | INFO | JSON | synthorg.security.*, synthorg.hr.*, synthorg.observability.* |
Audit-relevant events (security, HR, observability) |
errors.log |
File | ERROR | JSON | All loggers | Errors and above only |
agent_activity.log |
File | DEBUG | JSON | synthorg.engine.*, synthorg.core.*, synthorg.communication.*, synthorg.tools.*, synthorg.memory.* |
Agent execution, communication, tools, and memory |
cost_usage.log |
File | INFO | JSON | synthorg.budget.*, synthorg.providers.* |
Cost records and provider calls |
debug.log |
File | DEBUG | JSON | All loggers | Full debug trace (catch-all) |
access.log |
File | INFO | JSON | synthorg.api.* |
HTTP request/response access log |
persistence.log |
File | INFO | JSON | synthorg.persistence.* |
Database operations, migrations, CRUD |
configuration.log |
File | INFO | JSON | synthorg.settings.*, synthorg.config.* |
Settings resolution, config loading |
backup.log |
File | INFO | JSON | synthorg.backup.* |
Backup/restore lifecycle |
In addition to the 11 default sinks, three shipping sink types are available for centralized log aggregation and telemetry export:
| Sink Type | Transport | Format | Description |
|---|---|---|---|
| Syslog | UDP or TCP to a configurable endpoint | JSON | Ship structured logs to rsyslog, syslog-ng, or Graylog |
| HTTP | Batched POST to a configurable URL | JSON array | Ship log batches to any JSON-accepting endpoint |
| OTLP | HTTP POST to an OpenTelemetry collector | OTLP JSON | Map structlog events to OTLP log records with correlation IDs as trace context |
The HTTP sink sends raw JSON arrays. Backends that expect different payload formats
(e.g., Grafana Loki's /loki/api/v1/push, Elasticsearch's /_bulk) require a
collector/proxy (Promtail, Logstash, Vector, etc.) to translate the payload.
Shipping sinks are catch-all (no logger name routing) and are configured at runtime via the
custom_sinks setting or YAML. See the Centralized Logging
guide for configuration examples and deployment patterns.
Logger name routing is implemented via _LoggerNameFilter on file handlers. Sinks without
explicit routing are catch-all (accept all loggers at their configured level).
Exception formatting differs between sink types: format_exc_info is applied only to sinks
with json_format=True (converting exc_info tuples to formatted traceback strings for
serialization). Sinks with json_format=False (the default console sink) omit this
processor because ConsoleRenderer handles exception rendering natively.
Log Directory¶
- Docker:
/data/logs/(under thesynthorg-datavolume, persisted across restarts) - Local dev:
logs/relative to working directory (default) - Override:
SYNTHORG_LOG_DIRenv var
Rotation and Compression¶
File sinks use RotatingFileHandler by default (10 MB max, 5 backup files). Alternative:
WatchedFileHandler for external logrotate (rotation.strategy: external in config).
Rotated backup files can be automatically gzip-compressed by setting compress_rotated: true
in the rotation config. Compressed backups are stored as .log.N.gz instead of .log.N,
typically achieving 5--10x size reduction for structured JSON logs. Compression is off by
default for backward compatibility. compress_rotated is only supported with the built-in
rotation strategy; it is rejected when rotation.strategy is set to external.
Sensitive Field Redaction¶
The sanitize_sensitive_fields processor automatically redacts values for keys matching:
password, secret, token, api_key, api_secret, authorization, credential,
private_key, bearer, session. Redaction applies at all nesting depths in structured
log events. Redacted values are replaced with "**REDACTED**".
Correlation Tracking¶
Three correlation IDs propagated via contextvars (async-safe):
request_id: Bound per HTTP request byRequestLoggingMiddleware. Links all log events during a single API call.task_id: Bound per task execution. Links agent activity to a specific task.agent_id: Bound per agent execution context.
All three are automatically injected into every log event by merge_contextvars in the
structlog processor chain.
Per-Logger Levels¶
Default levels per domain module (overridable via LogConfig.logger_levels):
| Logger | Default Level |
|---|---|
synthorg.engine |
DEBUG |
synthorg.memory |
DEBUG |
synthorg.core |
INFO |
synthorg.communication |
INFO |
synthorg.providers |
INFO |
synthorg.budget |
INFO |
synthorg.security |
INFO |
synthorg.tools |
INFO |
synthorg.api |
INFO |
synthorg.cli |
INFO |
synthorg.config |
INFO |
synthorg.templates |
INFO |
Event Taxonomy¶
64 domain-specific event constant modules under observability/events/ (one per subsystem:
api, budget, risk_budget, reporting, blueprint, workflow_version, tool, git, engine, communication, security, etc.). Every log call uses a typed constant
(e.g., API_REQUEST_STARTED, BUDGET_RECORD_ADDED) for consistent, grep-friendly event
names. Format: "<domain>.<noun>.<verb>" (e.g., "api.request.started").
Uvicorn Integration¶
Uvicorn's default access logger is disabled (access_log=False, log_config=None).
HTTP access logging is handled by RequestLoggingMiddleware, which provides richer structured
fields (method, path, status_code, duration_ms, request_id) through structlog. Uvicorn's own
handlers are cleared by _tame_third_party_loggers() and its loggers (uvicorn,
uvicorn.error, uvicorn.access) are set to WARNING with propagate = True -- startup
INFO messages (e.g., "Uvicorn running on ...") are intentionally suppressed since the
application's own lifecycle logging provides equivalent structured events via structlog.
Warning and error messages still propagate through the structlog pipeline.
Litestar Integration¶
Litestar's built-in logging configuration is disabled (logging_config=None in the
Litestar() constructor). Without this, Litestar reconfigures stdlib's root handler on
startup via dictConfig(), which triggers _clearExistingHandlers and destroys the structlog
file sink handlers attached by _bootstrap_app_logging(). The bootstrap call in create_app
runs before the Litestar constructor and sets up all 11 sinks; logging_config=None ensures
they survive.
Third-Party Logger Taming¶
LiteLLM and its HTTP stack (httpx, httpcore) attach their own StreamHandler instances at
import time, producing duplicate output in Docker logs -- once via the library's own handler,
and once again via root propagation through the structlog sinks.
_tame_third_party_loggers() (called as step 7 of configure_logging, before per-logger level
overrides so explicit user settings take precedence) resolves this by:
- Suppressing LiteLLM's raw
print()output vialitellm.set_verbose = Falseandlitellm.suppress_debug_info = True(applied only whenlitellmis already imported -- avoids triggering LiteLLM's expensive import side-effects) - Clearing all handlers from
LiteLLM,LiteLLM Router,LiteLLM Proxy,aiosqlite,httpcore,httpcore.http11,httpcore.connection,httpx,uvicorn,uvicorn.error,uvicorn.access,anyio,multipart,faker, andfaker.factoryloggers - Setting each to
WARNINGandpropagate = Trueso warnings and errors still flow through the structlog pipeline
The provider and persistence layers already log meaningful events at appropriate levels via their own structlog calls; the third-party loggers would otherwise add noisy DEBUG output that duplicates or contradicts those structured events.
Docker Logging¶
Two layers of log management:
- App-level (structlog): 11 sinks (10 file + 1 console). File sinks use
RotatingFileHandler(10 MB x 5) writing JSON to/data/logs/. Console sink writes colored text to stderr. - Container-level (Docker):
json-filedriver with 10 MB x 3 rotation on stdout/stderr. Captures console sink output and any uncaught stderr.
The layers are complementary -- app files provide structured, routed logs; Docker captures
the console stream for docker logs access.
Runtime Settings¶
Four observability settings are runtime-editable via SettingsService:
root_log_level(enum: debug/info/warning/error/critical) -- changes the root logger levelenable_correlation(boolean) -- toggles correlation ID injectionsink_overrides(JSON) -- per-sink overrides keyed by sink identifier (__console__for the console sink, file path for file sinks). Each value is an object with optional fields:enabled(bool),level(string),json_format(bool),rotation(object withmax_bytes,backup_count,strategy,compress_rotated(built-in-only)). The console sink cannot be disabled (enabled: falseis rejected).custom_sinks(JSON) -- additional sinks as a JSON array. Each entry may specifysink_type(file,syslog,http; defaults tofile). File sinks requirefile_pathand acceptlevel,json_format,rotation,routing_prefixes. Syslog sinks requiresyslog_hostand acceptsyslog_port,syslog_facility,syslog_protocol,level. HTTP sinks requirehttp_urland accepthttp_headers,http_batch_size,http_flush_interval_seconds,http_timeout_seconds,http_max_retries,level.
Console sink level can also be overridden via SYNTHORG_LOG_LEVEL env var.
Changes take effect without restart -- the ObservabilitySettingsSubscriber rebuilds the entire
logging pipeline via configure_logging() (idempotent) when any of the four observability
settings change (root_log_level, enable_correlation, sink_overrides, or custom_sinks).
Custom sink file paths cannot collide with default sink paths (reserved even if disabled).
See Also¶
- Notifications -- notification dispatcher and sinks
- Design Overview -- full index