Memory and Persistence¶
Designed behaviour; runtime in active development
This page is the source of truth for the designed behaviour of this subsystem. The memory components exist as tested code, but the memory pipeline runs only inside a live agent, which is in active development (see the Roadmap). Persistence storage (SQLite/Postgres) is shipped and available now.
The SynthOrg framework separates two distinct storage concerns:
- Agent memory: what agents know, remember, and learn (working, episodic, semantic, procedural, social)
- Operational data: tasks, cost records, messages, and audit logs generated during execution
Both are implemented behind pluggable protocol interfaces, making storage backends swappable via configuration without modifying application code.
This page covers agent memory: types, levels, the backend protocol, embedder selection, and the consolidation / retention pipeline.
Related design docs¶
- Shared Organizational Memory: company-wide knowledge (policies, ADRs, procedures) behind
OrgMemoryBackend. - Operational Data Persistence:
PersistenceBackendprotocol, per-entity repositories, SQLite + Postgres backends, schema strategy, multi-tenancy, database-enforced invariants. - Memory Learning and Injection: procedural memory auto-generation (failure + success capture), cross-agent skill pool, injection strategies (context / tool-based / self-editing),
MemoryServiceREST + MCP entry point.
Memory Architecture¶
| Working Memory | Episodic Memory | Semantic Memory | Procedural Memory |
|---|---|---|---|
| Current task context | Past events & decisions | Knowledge & facts learned | Skills & how-to |
Storage Backend: Mem0 (durable, Qdrant+SQLite), InMemory (session-scoped), Composite (namespace-based routing adapter). See Decision Log.
Each agent maintains its own memory store. The storage backend is selected via configuration
and all access flows through the MemoryBackend protocol.
Memory Types¶
| Type | Scope | Persistence | Example |
|---|---|---|---|
| Working | Current task | None (in-context) | "I'm implementing the auth endpoint" |
| Episodic | Past events | Configurable | "Last sprint the team chose JWT over sessions" |
| Semantic | Knowledge | Long-term | "This project uses Litestar with aiosqlite" |
| Procedural | Skills/patterns | Long-term | "Code reviews require 2 approvals here" |
| Social | Relationships | Long-term | "The QA lead prefers detailed test plans" |
Memory Levels¶
Memory persistence is configurable per agent, from no persistence to fully persistent storage.
Memory Level Configuration
memory:
level: "persistent" # none | session | project | persistent (default: session)
backend: "mem0" # mem0 (default); also supports composite, inmemory
storage:
data_dir: "/data/memory" # mounted Docker volume path
vector_store: "qdrant" # hardcoded to embedded qdrant in Mem0 backend
history_store: "sqlite" # hardcoded to sqlite in Mem0 backend
options:
retention_days: null # null = forever
max_memories_per_agent: 10000
consolidation_interval: "daily" # compress old memories
shared_knowledge_base: true # agents can access shared facts
Memory Backend Protocol¶
Agent memory is implemented behind a pluggable MemoryBackend protocol with three concrete
implementations: Mem0 (durable, Qdrant+SQLite), InMemory (session-scoped), and Composite
(namespace-based routing adapter); see Decision Log. Application
code depends only on the protocol; the storage engine is an implementation detail swappable via
config.
Enums¶
| Enum | Values | Purpose |
|---|---|---|
MemoryCategory |
WORKING, EPISODIC, SEMANTIC, PROCEDURAL, SOCIAL | Memory type categories |
MemoryLevel |
PERSISTENT, PROJECT, SESSION, NONE | Persistence level per agent |
ConsolidationInterval |
HOURLY, DAILY, WEEKLY, NEVER | How often old memories are compressed |
MemoryBackend Protocol¶
@runtime_checkable
class MemoryBackend(Protocol):
"""Lifecycle + CRUD for agent memory storage."""
async def connect(self) -> None: ...
async def disconnect(self) -> None: ...
async def health_check(self) -> bool: ...
@property
def is_connected(self) -> bool: ...
@property
def backend_name(self) -> NotBlankStr: ...
async def store(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
"""Raises: MemoryConnectionError, MemoryStoreError."""
...
async def retrieve(self, agent_id: NotBlankStr, query: MemoryQuery) -> tuple[MemoryEntry, ...]:
"""Raises: MemoryConnectionError, MemoryRetrievalError."""
...
async def get(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> MemoryEntry | None:
"""Raises: MemoryConnectionError, MemoryRetrievalError."""
...
async def delete(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
"""Raises: MemoryConnectionError, MemoryStoreError."""
...
async def count(self, agent_id: NotBlankStr, *, category: MemoryCategory | None = None) -> int:
"""Raises: MemoryConnectionError, MemoryRetrievalError."""
...
MemoryCapabilities Protocol¶
Backends that implement MemoryCapabilities expose what features they support, enabling
runtime capability checks before attempting operations.
@runtime_checkable
class MemoryCapabilities(Protocol):
"""Capability discovery for memory backends."""
@property
def supported_categories(self) -> frozenset[MemoryCategory]: ...
@property
def supports_graph(self) -> bool: ...
@property
def supports_temporal(self) -> bool: ...
@property
def supports_vector_search(self) -> bool: ...
@property
def supports_shared_access(self) -> bool: ...
@property
def max_memories_per_agent(self) -> int | None: ...
SharedKnowledgeStore Protocol¶
Backends that support cross-agent shared knowledge implement this protocol alongside
MemoryBackend. Not all backends require cross-agent queries; this keeps the base protocol
clean.
@runtime_checkable
class SharedKnowledgeStore(Protocol):
"""Cross-agent shared knowledge operations."""
async def publish(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
"""Raises: MemoryConnectionError, MemoryStoreError."""
...
async def search_shared(self, query: MemoryQuery, *, exclude_agent: NotBlankStr | None = None) -> tuple[MemoryEntry, ...]:
"""Raises: MemoryConnectionError, MemoryRetrievalError."""
...
async def retract(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
"""Raises: MemoryConnectionError, MemoryStoreError."""
...
See Multi-Agent Memory Consistency for the consistency model used
when multiple agents share a SharedKnowledgeStore, including MVCC snapshot reads,
append-only write semantics, and conflict handling.
Error Hierarchy¶
All memory errors inherit from MemoryError so callers can catch the entire family with a
single except clause.
| Error | When Raised |
|---|---|
MemoryError |
Base exception for all memory operations |
MemoryConnectionError |
Backend connection cannot be established or is lost |
MemoryStoreError |
A store or delete operation fails |
MemoryRetrievalError |
A retrieve, search, or count operation fails |
MemoryNotFoundError |
A specific memory ID is not found |
MemoryConfigError |
Memory configuration is invalid |
MemoryCapabilityError |
An unsupported operation is attempted for a backend |
FineTuneDependencyError |
ML dependencies (torch, sentence-transformers) are missing |
FineTuneCancelledError |
A fine-tuning pipeline run is cancelled |
Configuration¶
memory:
backend: "mem0"
level: "persistent" # none, session, project, persistent (default: session)
storage:
data_dir: "/data/memory"
vector_store: "qdrant" # hardcoded to embedded qdrant in Mem0 backend
history_store: "sqlite" # hardcoded to sqlite in Mem0 backend
options:
retention_days: null # null = forever
max_memories_per_agent: 10000
consolidation_interval: "daily"
shared_knowledge_base: true
# Embedder config is passed programmatically via the factory:
# create_memory_backend(config, embedder=Mem0EmbedderConfig(
# provider="<embedding-provider>",
# model="<embedding-model-id>",
# dims=1536,
# ))
Configuration is modeled by CompanyMemoryConfig (top-level), MemoryStorageConfig
(storage paths/backends), and MemoryOptionsConfig (behaviour tuning). All are frozen
Pydantic models. The create_memory_backend(config, *, embedder=...) factory returns an
isolated MemoryBackend instance per company. The embedder kwarg is required for the
Mem0 backend (must be a Mem0EmbedderConfig).
Embedding Model Selection¶
Embedding model quality directly determines memory retrieval accuracy. The
LMEB benchmark (Zhao et al., March 2026) evaluates embedding
models on long-horizon memory retrieval across four types that map directly to SynthOrg's
MemoryCategory enum:
| SynthOrg Category | LMEB Category | Evaluation Priority |
|---|---|---|
| EPISODIC | Episodic (69 tasks) | High |
| PROCEDURAL | Procedural (67 tasks) | High |
| SEMANTIC | Semantic (15 tasks) | Medium |
| SOCIAL | Dialogue (42 tasks) | Medium |
| WORKING | N/A (in-context) | N/A |
MTEB scores do not predict memory retrieval quality (Pearson: -0.115, Spearman: -0.130). Embedding model selection must be evaluated on LMEB, not MTEB. See Decision Log and the Embedding Evaluation reference page for the full analysis, model rankings, and deployment tier recommendations.
Key findings:
- Larger models do not always outperform smaller ones on memory retrieval
- Dialogue/social memory is the hardest retrieval category for all models
- Instruction sensitivity varies per model; must be validated per deployment
- Three deployment tiers are recommended: full-resource (7-12B), mid-resource (1-4B), and CPU-only (< 1B)
Tier inference inputs (auto_select_embedder):
provider_preset_name: first registered provider name, read from the provider registry at setup-completion time. When operators use preset names verbatim as provider names (the wizard default), the preset hint steers tier selection; otherwise tier inference falls back to heuristic defaults.api.setup.has_gpu(yaml path; setting keysetup_has_gpuunder namespaceAPI): operator-owned boolean, default"false", advanced level. Flipped by the setup wizard (or directly by an operator) and read via_read_has_gpu_setting(settings_service). Acceptstrue/1/yes(-> True) andfalse/0/no/empty (-> False), case-insensitive; any other value returnsNone(unknown) silently, while a settings-service read failure logs a WARNING and also returnsNone. There is no platform probe today; the signal is operator-declared, not auto-detected.
Tier fallback is not a single CPU/GPU switch. auto_select_embedder uses preset-name and capability heuristics to pick a GPU_CONSUMER or GPU_FULL tier when has_gpu is True or None; the tier only collapses to the CPU-only default when the operator has selected a local/self-hosted preset and has explicitly set has_gpu=False. Missing or unparseable inputs degrade gracefully and never block setup completion.
Domain-Specific Embedding Fine-Tuning¶
Domain-specific fine-tuning can improve retrieval quality by 10-27% over base models (NVIDIA evaluation). The pipeline requires no manual annotation and runs on a single GPU.
Pipeline stages:
- Synthetic data generation: LLM generates query-document pairs from org documents (policies, ADRs, procedures, coding standards)
- Hard negative mining: base model embeds all passages (max_length=512) and queries
(max_length=128) with truncation enabled; top-k semantically similar but non-matching
passages become hard negatives. Inputs that overflow the token cap surface a
memory.fine_tune.encode_truncation_likelyWARNING so silent quality loss is visible - Contrastive fine-tuning: biencoder training with InfoNCE loss (tau=0.02, 3 epochs, lr=1e-5). Single GPU, 1-2 hours for ~500 documents
- Evaluation: NDCG@10 and Recall@10 comparison of the fine-tuned checkpoint against the base model on held-out validation data, re-using the Stage 2 query / passage token caps so eval embeddings are tokenisation-consistent with mining
- Deploy: save checkpoint; update
Mem0EmbedderConfigto point to fine-tuned model
Integration design: fine-tuning is an offline pipeline triggered via
POST /admin/memory/fine-tune (see MemoryAdminController). The optional
EmbeddingFineTuneConfig (disabled by default) stores the checkpoint path. When
enabled=True and checkpoint_path is set, backend initialization uses the
checkpoint path as the model identifier passed to the Mem0 SDK. The embedding
provider must serve the fine-tuned model under this identifier.
Container execution: when FineTuneExecutionConfig.backend is "docker", each
pipeline stage runs inside an ephemeral synthorg-fine-tune-gpu (default) or
synthorg-fine-tune-cpu container spawned by the backend via the Docker API. Both
variants ship the same Python runner and accept the same stage-config contract; they
differ only in the bundled torch build (CUDA ~4 GB vs CPU ~1.7 GB) and whether GPU
passthrough is usable. The variant is selected at synthorg init time (fresh installs)
or via synthorg config set fine_tuning_variant gpu|cpu (post-init, preserves data)
and persisted as fine_tuning_variant in config.json. The backend consumes
SYNTHORG_FINE_TUNE_IMAGE verbatim as a full image reference (including registry,
repository, and either a :tag or a digest-pinned @sha256:...); in a CLI-managed
install the rendered compose.yml writes the verified digest-pinned ref into this
env var automatically. Operators running a hand-managed compose.yml without the
CLI set SYNTHORG_FINE_TUNE_IMAGE on the backend directly; tag-based refs work
for quick evaluation, but production deployments should pin a digest so the backend
spawns the exact attested image. See Deployment → Fine-Tuning (optional)
for the BYO snippet. The container reads stage configuration
from /etc/fine-tune/config.json, executes the pipeline function, and emits
structured progress markers (STAGE_START:, STAGE_COMPLETE:) on stdout. The
orchestrator will parse these markers from container logs for progress reporting
(orchestrator integration is planned; the runner and markers are implemented).
Source data is mounted at /data (read-only), checkpoints written to /checkpoints
(read-write). GPU passthrough is available via gpu_enabled=True (only meaningful
for the GPU variant). The in-process fallback (backend="in-process") is preserved
for non-Docker deployments where torch is installed directly.
class EmbeddingFineTuneConfig(BaseModel):
model_config = ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid")
enabled: bool = False
checkpoint_path: NotBlankStr | None = None
base_model: NotBlankStr | None = None
training_data_dir: NotBlankStr | None = None
When enabled=True, both checkpoint_path and base_model are required
(enforced by model validation). Path traversal (..) and Windows-style
paths are rejected to prevent container path escapes.
The FineTuningPipeline protocol formalizes the five stages:
class FineTuningPipeline(Protocol):
async def generate_training_data(self, source_dir: str) -> Path: ...
async def mine_hard_negatives(self, training_data: Path) -> Path: ...
async def fine_tune(self, training_data: Path, base_model: str) -> Path: ...
async def evaluate(self, checkpoint: Path, base_model: str, validation_data: Path) -> EvalMetrics: ...
See Embedding Evaluation for the full pipeline design and expected improvement metrics.
Consolidation and Retention¶
Memory consolidation, retention enforcement, and archival are configured via frozen Pydantic
models in memory/consolidation/config.py:
| Config | Purpose |
|---|---|
ConsolidationConfig |
Top-level: max_memories_per_agent limit, nested retention and archival sub-configs |
RetentionConfig |
Company-level per-category RetentionRule tuples (category + retention_days), optional default_retention_days fallback; agents can override via MemoryConfig.retention_overrides |
ArchivalConfig |
Enables/disables archival of consolidated entries to ArchivalStore, nested DualModeConfig |
DualModeConfig |
Density-aware dual-mode archival: threshold, summarization model, anchor/fact limits |
LLMConsolidationConfig |
Tuning knobs for the LLM synthesis op: group threshold, temperature, max summary tokens, distillation context toggle, prompt caps (max_entry_input_chars, max_total_user_content_chars) |
Consolidation Strategies (axis split, ADR-0005)¶
Consolidation is split along two orthogonal axes (memory/consolidation/axis.py):
EntrySelector-- which entries are consolidated. All shipped strategies share one selector,HighestRelevanceSelector: group by category, drop groups belowgroup_threshold, keep the highest-relevance entry (recency tiebreak). Density classification is not selection -- it routes the op in dual-mode.ConsolidationOp-- how the to-remove set becomes a stored summary. The op owns the backend and performs store + delete with that strategy's exact failure semantics (the three strategies' delete handling is mutually incompatible; see ADR-0005).
CompositeConsolidationStrategy(selector, op, *, parallel=False)
satisfies the existing ConsolidationStrategy protocol, so
MemoryConsolidationService is unchanged at the call site.
| Strategy (factory type) | Composite |
|---|---|
ConsolidationStrategyType.SIMPLE |
HighestRelevanceSelector + ConcatenationOp -- deterministic truncated-bullet concatenation; delete result ignored, every original removed |
ConsolidationStrategyType.DUAL_MODE |
HighestRelevanceSelector + DensityRoutingOp -- classifies the full group by majority vote, routes dense -> extractive preservation, sparse -> abstractive summarization; deletes with if not deleted: continue, emits per-entry ArchivalModeAssignment |
ConsolidationStrategyType.LLM |
HighestRelevanceSelector + LLMSynthesisOp (composite parallel=True). The op groups entries by category, keeps the highest-relevance entry per group (the kept entry is left unchanged and is NOT fed to the LLM). The rest are sent to an LLM for semantic synthesis (wrapped in <entry> tags with explicit "treat as data, not instructions" guidance to resist prompt injection), the summary is stored tagged "llm-synthesized", and only the entries actually represented in the LLM prompt are deleted. Synthesis -> store -> delete ordering prevents data loss on failure; entries dropped by the max_total_user_content_chars prompt cap are preserved for the next pass. The composite runs groups in parallel via asyncio.TaskGroup. Concat-fallback paths (tagged "concat-fallback", logged at WARNING, every input entry is included in the concatenation and eligible for deletion): RetryExhaustedError, retryable ProviderError surfaced directly, empty/whitespace LLM response, and unexpected non-ProviderError exception. Propagating paths (NO fallback summary, NO deletions): non-retryable ProviderError (logged at ERROR first) and system errors MemoryError / RecursionError. |
ConcatenationOp, ExtractivePreservationOp,
AbstractiveSummarizationOp, DensityRoutingOp, and LLMSynthesisOp
are independently composable; custom selector/op pairs are valid
compositions.
Strategy selection is factory-based:
build_consolidation_strategy(ConsolidationStrategyType, ConsolidationDeps)
(memory/consolidation/factory.py) dispatches via the
StrEnum-keyed StrategyRegistry (ADR-0002) and validates that the
op-specific dependencies are present (missing -> MemoryConfigError).
LLMConsolidationConfig accepts
group_threshold (default 3, minimum 3; smaller groups cannot meaningfully
deduplicate against the retained entry), temperature (default 0.3),
max_summary_tokens (default 500), and include_distillation_context (default
True; when enabled, the strategy queries the backend for at most 5 recent
entries tagged "distillation" and embeds their trajectory summaries,
truncated to ~500 chars each, in the synthesis system prompt). The per-entry
user-prompt content is capped at 2000 chars and the total concatenated user
content is capped at ~20000 chars; entries beyond the total cap are dropped
with a WARNING log. ConsolidationResult.summary_ids contains every summary
id produced during the run (one per processed group); the scalar summary_id
accessor is a @computed_field returning the last element for callers that
only need a representative id.
Distillation Capture¶
At task completion, synthorg.memory.consolidation.capture_distillation records
the execution trajectory as an EPISODIC memory entry tagged "distillation".
DistillationRequest captures:
| Field | Source |
|---|---|
agent_id, task_id |
Caller context |
trajectory_summary |
Turn count, total tokens, unique tools, total tool calls |
outcome |
TerminationReason + optional error message |
memory_tool_invocations |
MemoryToolName enum values (SEARCH_MEMORY, RECALL_MEMORY) extracted from TurnRecord.tool_calls_made (NOT memory entry IDs; typed enum members, counted per invocation) |
created_at |
Capture timestamp |
AgentEngine wires this into _post_execution_pipeline when
distillation_capture_enabled=True is passed to the constructor (default False
for opt-in behavior). Capture fires regardless of termination reason;
successful runs, errors, timeouts, and budget exhaustions all produce useful
trajectory context for downstream consolidation. The helper is non-critical:
non-system failures log at WARNING and return None; system errors
(builtins.MemoryError, RecursionError) propagate.
Downstream, LLMConsolidationStrategy picks these entries up by tag query
when synthesizing category groups, embedding the trajectory summaries and
outcomes in the synthesis system prompt so the LLM has context about what the
agent was trying to accomplish when the memories it is merging were created.
Dual-Mode Archival¶
When ArchivalConfig.dual_mode.enabled is True, consolidation classifies content density before
choosing an archival mode. This prevents catastrophic information loss from naively summarizing
dense content (code, structured data, identifiers). Based on research: Memex
(arXiv:2603.04257) and KV Cache Attention Matching
(arXiv:2602.16284).
| Density | Archival Mode | Method |
|---|---|---|
| Sparse (conversational, narrative) | ABSTRACTIVE |
LLM-generated summary via AbstractiveSummarizer |
| Dense (code, structured data, IDs) | EXTRACTIVE |
Verbatim key-fact extraction + start/mid/end anchors via ExtractivePreserver |
Classification is heuristic-based (DensityClassifier), using five weighted signals: code
patterns, structured data markers, identifier density, numeric density, and line structure. No LLM
is needed for classification; only for abstractive summarization. Groups are classified by
majority vote: if most entries in a category group are dense, the group uses extractive mode.
Deterministic restore: When entries are archived, the service builds an archival_index
(mapping original_id -> archival_id) on ConsolidationResult. Agents can use this index to
call ArchivalStore.restore(agent_id, entry_id) directly by ID, bypassing semantic search.
| Model | Purpose |
|---|---|
ArchivalMode |
Enum: ABSTRACTIVE or EXTRACTIVE |
ArchivalModeAssignment |
Maps a removed entry ID to its archival mode (set by strategy) |
ArchivalIndexEntry |
Maps original entry ID to archival store ID (built by service) |
Per-Agent Retention Overrides¶
Individual agents can override company-level retention rules via
MemoryConfig.retention_overrides (per-category) and
MemoryConfig.retention_days (agent-level default).
Resolution order per category:
- Agent per-category rule
- Company per-category rule
- Agent global default
- Company global default
- Keep forever (no expiry)