Persistence¶
SynthOrg abstracts durable storage behind a set of repository protocols so the engine, agent runtime, budget tracker, security auditor, and HR subsystems all depend on interfaces rather than concrete backends. Two backends ship in the reference implementation: SQLite for single-user development and small self-hosted setups, Postgres for multi-user production deployments with concurrent writers. Both implement the same Python protocol surface; switching backends is a configuration change, not a code change.
Persistence boundary (mandatory)¶
src/synthorg/persistence/ is the only place in the repository that may
import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL
DDL/DML keywords in string literals. The boundary exists so every durable
feature goes through a repository protocol, so swapping a backend is a config
change, and so encrypted secret backends and test fixtures are never skipped
by a caller that reached past the abstraction.
- Sanctioned exceptions cover three categories: (1) the two agent-facing DB
introspection tools (
src/synthorg/tools/database/schema_inspect.py,src/synthorg/tools/database/sql_query.py); (2) security/scanning utilities that inspect user-supplied SQL, such assrc/synthorg/security/rules/destructive_op_detector.py, whose detection payload is DDL keyword strings; and (3) test fixtures and conformance harnesses that hold driver primitives for cross-subsystem setup. The authoritative list lives in_ALLOWLISTinsidescripts/check_persistence_boundary.py-- consult it before assuming a path is (or is not) covered. Any new exception must be added there with a justifying comment. - Every durable feature must define a repository protocol in
persistence/<domain>_protocol.py, ship concrete implementations underpersistence/{sqlite,postgres}/, and expose them onPersistenceBackend. - Adding a migration: read
docs/guides/persistence-migrations.mdfirst. Never hand-edit SQL inpersistence/{sqlite,postgres}/revisions/. Never editatlas.sum. Never runatlas migrate hashpost-release (a PreToolUse hook blocks it). - Per-line opt-out:
# lint-allow: persistence-boundary -- <required justification>as a trailing comment. The justification after--must be non-empty. - Enforced by
scripts/check_persistence_boundary.py, wired into the pre-push hook and the CI Lint job -- both fail loudly on violations.
The same rule is restated verbatim in CLAUDE.md
(## Persistence Boundary) so Claude and human developers see the same contract
no matter which surface they hit first.
Backend catalog¶
| Backend | Primary use case | Concurrency model | Migration tool |
|---|---|---|---|
| SQLite | Single-user dev, small self-hosted, demos | Single-writer with WAL journaling | Atlas |
| Postgres | Multi-user production, high-concurrency write paths | Row-level MVCC, server-side triggers | Atlas |
Repositories live under src/synthorg/persistence/sqlite/ and
src/synthorg/persistence/postgres/, behind domain-scoped protocols declared
one level up in src/synthorg/persistence/:
| Protocol module | Concerns |
|---|---|
protocol.py (PersistenceBackend) |
Backend aggregate: connect / disconnect, migrate, and accessors for every repository below |
approval_protocol.py |
ApprovalRepository -- human-in-the-loop decision queue |
auth_protocol.py |
SessionRepository, RefreshTokenRepository, LockoutRepository |
escalation_protocol.py |
Conflict-resolution escalation queue |
fine_tune_protocol.py |
FineTuneRunRepository, FineTuneCheckpointRepository |
mcp_protocol.py |
MCP catalog installation repository |
memory_protocol.py |
Org-memory fact repository with MVCC log + snapshot |
ontology_protocol.py |
Ontology entity + drift-report repositories |
version_repo.py |
Generic version-snapshot repository reused by ontology + future versioned entities |
secret_backends/protocol.py |
SecretBackend protocol used by the secret-backend factory |
The table above is representative, not exhaustive -- the authoritative
inventory is the set of properties / factory methods on
PersistenceBackend in src/synthorg/persistence/protocol.py, and the
concrete repositories live alongside the dialect-specific backends in
src/synthorg/persistence/{sqlite,postgres}/.
Every concrete repository implements its matching protocol; application code
depends on the protocol, not the implementation. Switching backends is a
configuration change (PersistenceConfig.backend) rather than a code change,
and the unit-test suite stays backend-free so most tests remain fast and local.
API controllers reach persistence through domain-scoped service layers
(ArtifactService, WorkflowService, MemoryService,
CustomRulesService, UserService, ...) rather than importing
repositories directly. Services keep controllers thin, centralise audit
logging, and own cross-repo orchestration (e.g. workflow-definition delete
cascading to its version snapshots) so the audit trail stays consistent
regardless of which HTTP endpoint invoked the mutation.
Postgres adds server-side integrity beyond what SQLite can express: CONSTRAINT
TRIGGERs enforce "exactly one CEO" and "at least one owner" invariants across
concurrent writers, and optional capability protocols surface Postgres-native
features (JSONB analytics, TimescaleDB hypertables) that SQLite callers
simply do not see.
Backend capability methods¶
When an application service needs a subsystem whose construction differs
between backends -- e.g. ontology versioning that wraps an
aiosqlite.Connection for SQLite but an AsyncConnectionPool for
Postgres -- the pattern is a build_*() method on PersistenceBackend
instead of an isinstance(persistence, ConcreteBackend) branch at the
call site. Current examples include build_lockouts(auth_config),
build_escalations(notify_channel), and build_ontology_versioning().
Callers always type against the protocol; the factory hides the dialect
choice. This matches the "no isinstance against concrete persistence
backends outside persistence/ or its factory" rule enforced by the
code audit (ARC-1, #1491).
Schema patterns¶
The schema (src/synthorg/persistence/postgres/schema.sql and its SQLite
sibling) mixes two write patterns:
Mutable tables -- canonical state with in-place updates. Examples:
users, settings, agent_states, heartbeats, approvals,
custom_rules. Rows are updated on every state transition; row count stays
bounded. Concurrent updates are serialised by MVCC + application-level CAS
(settings use updated_at as an etag; see SettingsRepository.set and
set_many). Both approvals (human-in-the-loop decision queue,
pending/approved/rejected/expired state machine) and custom_rules
(operator-defined alert thresholds) exist on both backends -- previously
they shipped only on SQLite and the Postgres parity gap was closed in the
budget-persistence audit.
Append-only time-series tables -- facts with a timestamp column, never
updated in place. Examples: cost_records, audit_entries,
lifecycle_events, messages, task_metrics, collaboration_metrics,
login_attempts. These tables grow linearly with system activity and are the
primary candidates for time-based partitioning. cost_records and
audit_entries both have the partitioning column (timestamp) composed into
the primary key so they can be converted to TimescaleDB hypertables without
touching any application code.
Monetary columns carry a sibling currency TEXT NOT NULL column.
cost_records, task_metrics, and agent_states each store an ISO 4217 code
alongside the numeric cost (or accumulated_cost) so the unit travels with the
value. A DB-level CHECK constraint (currency ~ '^[A-Z]{3}$' on Postgres,
currency GLOB '[A-Z][A-Z][A-Z]' on SQLite) keeps direct-SQL writes honest.
Aggregators read both columns and enforce a same-currency invariant at
read time; mixing currencies in a single rollup raises
MixedCurrencyAggregationError. New currency columns default to 'USD'
(the provider-native token-pricing unit for every major LLM vendor the
project integrates with) so existing deployments migrate without a manual
backfill. Operators running a non-USD deployment can re-stamp historical
rows after the migration runs by executing a targeted
UPDATE <table> SET currency = '<ISO 4217 code>' WHERE currency = 'USD'
statement; SynthOrg does not provide a post-migrate hook for this because
the correct target currency depends on the operator's business model.
Currency is a display preference, not a conversion unit. SynthOrg
stamps each record with the operator's configured budget.currency but
does not perform FX conversion on the numeric values. Every major LLM
provider (Anthropic, OpenAI, Google Gemini, Mistral, Cohere, Groq, et al.)
publishes token pricing in USD, and LiteLLM returns response_cost in USD
too. Changing budget.currency relabels the display symbol for future
rows; it does not translate the numbers. Cross-currency reporting (with a
real FX rate source and timestamped conversion) is a separate feature
tracked on the roadmap.
heartbeats is deliberately excluded from the append-only set. Despite
having a timestamp, it stores one row per execution_id and updates that row
on every pulse -- the write pattern is update-heavy and the row count is
bounded by the number of live executions. Hypertables optimise for immutable
append-only data, so converting heartbeats would be the wrong choice.
Time-series tables and TimescaleDB hypertables¶
For append-only tables on Postgres deployments, SynthOrg supports converting
cost_records and audit_entries into TimescaleDB hypertables. Hypertables
transparently partition the data into time-bucketed chunks so queries that
filter on timestamp scan a bounded subset of chunks rather than the whole
table, and operations like DROP TABLE or chunk eviction become O(chunk
count) rather than O(row count).
The feature is off by default and gated behind
PostgresConfig.enable_timescaledb. Operators running vanilla Postgres or a
managed service without TimescaleDB leave it off and the tables stay regular
relational tables with a composite primary key. Note: the composite-primary-key
schema change and its Atlas migration run unconditionally (they are valid on
vanilla Postgres); only the create_hypertable step is gated behind the flag.
PostgresConfig(
...,
enable_timescaledb=True,
cost_records_chunk_interval="1 day",
audit_entries_chunk_interval="1 day",
)
When enabled, the backend's migrate method runs two phases: first Atlas
applies the declarative schema migrations, then a dedicated step calls
create_hypertable on each target table. The conversion is idempotent
(if_not_exists => TRUE) so reruns and restores are safe. If the
timescaledb extension is not installed on the server, the flag is treated as
a best-effort hint and the backend logs a warning rather than failing the
migration -- this lets operators leave the flag true in shared config and have
it degrade gracefully on clusters that do not support it.
Scope: Apache-2.0 features only. The Postgres backend uses exclusively
TimescaleDB features that ship under Apache-2.0: core hypertables,
create_hypertable, chunk management, and drop_chunks. Retention policies,
compression, and continuous aggregates are under the Timescale License and are
not used -- they would force every deployment to accept the Timescale
License terms and would not run on the OSS image (-oss tag). Self-hosted
operators who want retention today can call drop_chunks('cost_records',
older_than => INTERVAL '90 days') on their own cron schedule until SynthOrg
grows a backend-owned retention policy that stays within Apache-2.0.
| Table | Included | Rationale |
|---|---|---|
cost_records |
Yes | LLM call costs; append-only; highest-volume time-series table. |
audit_entries |
Yes | Security events; append-only; compliance queries are time-bound. |
heartbeats |
No | Update-heavy (per-execution row bump); hypertable semantics wrong. |
Managed-service compatibility¶
TimescaleDB is a self-hosted-only feature. The major managed Postgres offerings
(AWS RDS, Google Cloud SQL) do not allow custom extensions; operators
cannot install timescaledb there. Azure Database for PostgreSQL
Flexible Server is an exception -- it supports TimescaleDB as an extension. SynthOrg runs
cleanly on all of them -- leave enable_timescaledb=False and the schema stays
fully relational. The composite primary keys on cost_records and
audit_entries are valid on vanilla Postgres and do not require TimescaleDB to
function; they just preserve the option of turning hypertables on later if the
deployment moves to self-hosted.
| Deployment target | TimescaleDB support | Recommended setting |
|---|---|---|
| Self-hosted Postgres 18+ | Operator-installed | enable_timescaledb=True |
| AWS RDS / Aurora Postgres | Not available | enable_timescaledb=False |
| Google Cloud SQL Postgres | Not available | enable_timescaledb=False |
| Azure Database for Postgres (Flexible Server) | Supported (extension) | enable_timescaledb=True |
| Docker / local dev | timescale/timescaledb:latest-pg18-oss |
enable_timescaledb=True |
Extension strategy¶
Postgres extensions need two things to work through Atlas's declarative
pipeline: the extension DDL has to be acceptable to Atlas's dev database during
atlas migrate diff, and any catalog objects the extension creates after
migrations run must not be flagged as drift on subsequent diffs. SynthOrg
handles this with a two-step pattern:
- Declarative schema (
schema.sql+ Atlas migrations) only contains DDL that is valid on vanilla Postgres. Function-call SQL likeSELECT create_hypertable(...)cannot live here because Atlas's declarative diff engine does not parse function calls. - Runtime setup hooks in the backend's
migratemethod run post-Atlas SQL against the real target database. These hooks detect extension availability viapg_available_extensionsand skip gracefully when the extension is not installed, so the same config works on vanilla Postgres and on self-hosted TimescaleDB without branching at deployment time.
This pattern scales to other extensions (pgvector, pg_trgm, pgcrypto) if
SynthOrg adopts them later. The rule is: if the extension adds objects that
Atlas cannot express or recognize, add a runtime setup hook; if the extension
is purely about CREATE EXTENSION and then standard DDL, let Atlas own it.
Migration workflow¶
Migrations are generated by Atlas from the single source
of truth in src/synthorg/persistence/<backend>/schema.sql. The full
developer-facing workflow, happy path, and AI-agent rules live in
docs/guides/persistence-migrations.md;
read it before adding or regenerating a migration. The short form:
atlas migrate diff --env sqlite <name> # SQLite
atlas migrate diff --env postgres <name> # Postgres (requires Docker dev DB)
Never hand-edit generated migration files, and never run atlas migrate hash
post-release (a PreToolUse hook blocks it in the default environment). If a
migration needs to change before it has landed, delete the file and regenerate
it via atlas migrate diff; this preserves atlas.sum integrity end-to-end.
Hand-written migrations (procedural SQL that Atlas cannot derive from
schema.sql) are NOT added to the revisions/ directory because they would
invalidate atlas.sum. Instead, procedural setup runs through the backend's
runtime migration hooks (see the TimescaleDB pattern above).
Migration squashing¶
As the migration count grows, the revisions directory becomes harder to review and Atlas's diff engine slows perceptibly. SynthOrg uses a periodic partial squash to keep the history manageable while preserving upgrade paths.
When squashing triggers¶
The squash script (scripts/squash_migrations.sh) checks both SQLite and
Postgres backends. When a backend exceeds 100 migration files (configurable
via SQUASH_THRESHOLD), the oldest files beyond the newest 50
(SQUASH_KEEP) are replaced by a single Atlas checkpoint.
How it works¶
- The script copies the oldest files to a temporary directory and runs
atlas migrate checkpointto produce a DDL-only snapshot of the schema at that point. - The checkpoint file is timestamped between the last squashed migration and the first kept migration so Atlas orders it correctly.
- The original revisions directory is rebuilt with the checkpoint plus the
remaining individual files, and
atlas migrate hashregeneratesatlas.sum.
Upgrade paths after squash¶
| Database state | Behavior |
|---|---|
| Fresh install | Applies checkpoint (full schema to squash point) then remaining files |
| At or past squash point | Skips checkpoint, applies only unapplied files |
| Before squash point | Error -- the individual files it needs are gone; upgrade through the unsquashed release first |
The "before squash point" case is safe because the threshold (100) and keep count (50) guarantee that by the time a squash runs, all production databases have had at least 50 migration versions to catch up past the squash boundary.
Dual-backend squashing¶
Both SQLite and Postgres backends are processed independently in a single invocation. Each backend may have a different migration count; only backends that exceed the threshold are squashed.
Committing a squash¶
Squash commits delete old migration files and rewrite atlas.sum, which the
pre-commit hook check_no_modify_migration.sh would normally block. Set the
SYNTHORG_MIGRATION_SQUASH environment variable to bypass: