Memory and Persistence¶

Designed behaviour; runtime in active development

This page is the source of truth for the designed behaviour of this subsystem. The memory components exist as tested code, but the memory pipeline runs only inside a live agent, which is in active development (see the Roadmap). Persistence storage (SQLite/Postgres) is shipped and available now.

The SynthOrg framework separates two distinct storage concerns:

Agent memory: what agents know, remember, and learn (working, episodic, semantic, procedural, social)
Operational data: tasks, cost records, messages, and audit logs generated during execution

Both are implemented behind pluggable protocol interfaces, making storage backends swappable via configuration without modifying application code.

This page covers agent memory: types, levels, the backend protocol, embedder selection, and the consolidation / retention pipeline.

Shared Organizational Memory: company-wide knowledge (policies, ADRs, procedures) behind OrgMemoryBackend.
Operational Data Persistence: PersistenceBackend protocol, per-entity repositories, SQLite + Postgres backends, schema strategy, multi-tenancy, database-enforced invariants.
Memory Learning and Injection: procedural memory auto-generation (failure + success capture), cross-agent skill pool, injection strategies (context / tool-based / self-editing), MemoryService REST + MCP entry point.

Memory Architecture¶

Working Memory	Episodic Memory	Semantic Memory	Procedural Memory
Current task context	Past events & decisions	Knowledge & facts learned	Skills & how-to

Storage Backend: Mem0 (durable, Qdrant+SQLite), InMemory (session-scoped), Composite (namespace-based routing adapter). See Decision Log.

Each agent maintains its own memory store. The storage backend is selected via configuration and all access flows through the MemoryBackend protocol.

Memory Types¶

Type	Scope	Persistence	Example
Working	Current task	None (in-context)	"I'm implementing the auth endpoint"
Episodic	Past events	Configurable	"Last sprint the team chose JWT over sessions"
Semantic	Knowledge	Long-term	"This project uses Litestar with aiosqlite"
Procedural	Skills/patterns	Long-term	"Code reviews require 2 approvals here"
Social	Relationships	Long-term	"The QA lead prefers detailed test plans"

Memory Levels¶

Memory persistence is configurable per agent, from no persistence to fully persistent storage.

Memory Level Configuration

memory:
  level: "persistent"            # none | session | project | persistent (default: session)
  backend: "mem0"               # mem0 (default); also supports composite, inmemory
  storage:
    data_dir: "/data/memory"    # mounted Docker volume path
    vector_store: "qdrant"      # hardcoded to embedded qdrant in Mem0 backend
    history_store: "sqlite"     # hardcoded to sqlite in Mem0 backend
  options:
    retention_days: null         # null = forever
    max_memories_per_agent: 10000
    consolidation_interval: "daily"  # compress old memories
    shared_knowledge_base: true      # agents can access shared facts

Memory Backend Protocol¶

Agent memory is implemented behind a pluggable MemoryBackend protocol with three concrete implementations: Mem0 (durable, Qdrant+SQLite), InMemory (session-scoped), and Composite (namespace-based routing adapter); see Decision Log. Application code depends only on the protocol; the storage engine is an implementation detail swappable via config.

Enums¶

Enum	Values	Purpose
`MemoryCategory`	WORKING, EPISODIC, SEMANTIC, PROCEDURAL, SOCIAL	Memory type categories
`MemoryLevel`	PERSISTENT, PROJECT, SESSION, NONE	Persistence level per agent
`ConsolidationInterval`	HOURLY, DAILY, WEEKLY, NEVER	How often old memories are compressed

MemoryBackend Protocol¶

@runtime_checkable
class MemoryBackend(Protocol):
    """Lifecycle + CRUD for agent memory storage."""

    async def connect(self) -> None: ...
    async def disconnect(self) -> None: ...
    async def health_check(self) -> bool: ...

    @property
    def is_connected(self) -> bool: ...
    @property
    def backend_name(self) -> NotBlankStr: ...

    async def store(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def retrieve(self, agent_id: NotBlankStr, query: MemoryQuery) -> tuple[MemoryEntry, ...]:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def get(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> MemoryEntry | None:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def delete(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def count(self, agent_id: NotBlankStr, *, category: MemoryCategory | None = None) -> int:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...

MemoryCapabilities Protocol¶

Backends that implement MemoryCapabilities expose what features they support, enabling runtime capability checks before attempting operations.

@runtime_checkable
class MemoryCapabilities(Protocol):
    """Capability discovery for memory backends."""

    @property
    def supported_categories(self) -> frozenset[MemoryCategory]: ...
    @property
    def supports_graph(self) -> bool: ...
    @property
    def supports_temporal(self) -> bool: ...
    @property
    def supports_vector_search(self) -> bool: ...
    @property
    def supports_shared_access(self) -> bool: ...
    @property
    def max_memories_per_agent(self) -> int | None: ...

SharedKnowledgeStore Protocol¶

Backends that support cross-agent shared knowledge implement this protocol alongside MemoryBackend. Not all backends require cross-agent queries; this keeps the base protocol clean.

@runtime_checkable
class SharedKnowledgeStore(Protocol):
    """Cross-agent shared knowledge operations."""

    async def publish(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def search_shared(self, query: MemoryQuery, *, exclude_agent: NotBlankStr | None = None) -> tuple[MemoryEntry, ...]:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def retract(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...

See Multi-Agent Memory Consistency for the consistency model used when multiple agents share a SharedKnowledgeStore, including MVCC snapshot reads, append-only write semantics, and conflict handling.

Error Hierarchy¶

All memory errors inherit from MemoryError so callers can catch the entire family with a single except clause.

Error	When Raised
`MemoryError`	Base exception for all memory operations
`MemoryConnectionError`	Backend connection cannot be established or is lost
`MemoryStoreError`	A store or delete operation fails
`MemoryRetrievalError`	A retrieve, search, or count operation fails
`MemoryNotFoundError`	A specific memory ID is not found
`MemoryConfigError`	Memory configuration is invalid
`MemoryCapabilityError`	An unsupported operation is attempted for a backend
`FineTuneDependencyError`	ML dependencies (torch, sentence-transformers) are missing
`FineTuneCancelledError`	A fine-tuning pipeline run is cancelled

Configuration¶

memory:
  backend: "mem0"
  level: "persistent"              # none, session, project, persistent (default: session)
  storage:
    data_dir: "/data/memory"
    vector_store: "qdrant"          # hardcoded to embedded qdrant in Mem0 backend
    history_store: "sqlite"         # hardcoded to sqlite in Mem0 backend
  options:
    retention_days: null            # null = forever
    max_memories_per_agent: 10000
    consolidation_interval: "daily"
    shared_knowledge_base: true

# Embedder config is passed programmatically via the factory:
#   create_memory_backend(config, embedder=Mem0EmbedderConfig(
#       provider="<embedding-provider>",
#       model="<embedding-model-id>",
#       dims=1536,
#   ))

Configuration is modeled by CompanyMemoryConfig (top-level), MemoryStorageConfig (storage paths/backends), and MemoryOptionsConfig (behaviour tuning). All are frozen Pydantic models. The create_memory_backend(config, *, embedder=...) factory returns an isolated MemoryBackend instance per company. The embedder kwarg is required for the Mem0 backend (must be a Mem0EmbedderConfig).

Embedding Model Selection¶

Embedding model quality directly determines memory retrieval accuracy. The LMEB benchmark (Zhao et al., March 2026) evaluates embedding models on long-horizon memory retrieval across four types that map directly to SynthOrg's MemoryCategory enum:

SynthOrg Category	LMEB Category	Evaluation Priority
EPISODIC	Episodic (69 tasks)	High
PROCEDURAL	Procedural (67 tasks)	High
SEMANTIC	Semantic (15 tasks)	Medium
SOCIAL	Dialogue (42 tasks)	Medium
WORKING	N/A (in-context)	N/A

MTEB scores do not predict memory retrieval quality (Pearson: -0.115, Spearman: -0.130). Embedding model selection must be evaluated on LMEB, not MTEB. See Decision Log and the Embedding Evaluation reference page for the full analysis, model rankings, and deployment tier recommendations.

Key findings:

Larger models do not always outperform smaller ones on memory retrieval
Dialogue/social memory is the hardest retrieval category for all models
Instruction sensitivity varies per model; must be validated per deployment
Three deployment tiers are recommended: full-resource (7-12B), mid-resource (1-4B), and CPU-only (< 1B)

Tier inference inputs (auto_select_embedder):

provider_preset_name: first registered provider name, read from the provider registry at setup-completion time. When operators use preset names verbatim as provider names (the wizard default), the preset hint steers tier selection; otherwise tier inference falls back to heuristic defaults.
api.setup.has_gpu (yaml path; setting key setup_has_gpu under namespace API): operator-owned boolean, default "false", advanced level. Flipped by the setup wizard (or directly by an operator) and read via _read_has_gpu_setting(settings_service). Accepts true/1/yes (-> True) and false/0/no/empty (-> False), case-insensitive; any other value returns None (unknown) silently, while a settings-service read failure logs a WARNING and also returns None. There is no platform probe today; the signal is operator-declared, not auto-detected.

Tier fallback is not a single CPU/GPU switch. auto_select_embedder uses preset-name and capability heuristics to pick a GPU_CONSUMER or GPU_FULL tier when has_gpu is True or None; the tier only collapses to the CPU-only default when the operator has selected a local/self-hosted preset and has explicitly set has_gpu=False. Missing or unparseable inputs degrade gracefully and never block setup completion.

Domain-Specific Embedding Fine-Tuning¶

Domain-specific fine-tuning can improve retrieval quality by 10-27% over base models (NVIDIA evaluation). The pipeline requires no manual annotation and runs on a single GPU.

Pipeline stages:

Synthetic data generation: LLM generates query-document pairs from org documents (policies, ADRs, procedures, coding standards)
Hard negative mining: base model embeds all passages (max_length=512) and queries (max_length=128) with truncation enabled; top-k semantically similar but non-matching passages become hard negatives. Inputs that overflow the token cap surface a memory.fine_tune.encode_truncation_likely WARNING so silent quality loss is visible
Contrastive fine-tuning: biencoder training with InfoNCE loss (tau=0.02, 3 epochs, lr=1e-5). Single GPU, 1-2 hours for ~500 documents
Evaluation: NDCG@10 and Recall@10 comparison of the fine-tuned checkpoint against the base model on held-out validation data, re-using the Stage 2 query / passage token caps so eval embeddings are tokenisation-consistent with mining
Deploy: save checkpoint; update Mem0EmbedderConfig to point to fine-tuned model

Integration design: fine-tuning is an offline pipeline triggered via POST /admin/memory/fine-tune (see MemoryAdminController). The optional EmbeddingFineTuneConfig (disabled by default) stores the checkpoint path. When enabled=True and checkpoint_path is set, backend initialization uses the checkpoint path as the model identifier passed to the Mem0 SDK. The embedding provider must serve the fine-tuned model under this identifier.

Container execution: when FineTuneExecutionConfig.backend is "docker", each pipeline stage runs inside an ephemeral synthorg-fine-tune-gpu (default) or synthorg-fine-tune-cpu container spawned by the backend via the Docker API. Both variants ship the same Python runner and accept the same stage-config contract; they differ only in the bundled torch build (CUDA ~4 GB vs CPU ~1.7 GB) and whether GPU passthrough is usable. The variant is selected at synthorg init time (fresh installs) or via synthorg config set fine_tuning_variant gpu|cpu (post-init, preserves data) and persisted as fine_tuning_variant in config.json. The backend consumes SYNTHORG_FINE_TUNE_IMAGE verbatim as a full image reference (including registry, repository, and either a :tag or a digest-pinned @sha256:...); in a CLI-managed install the rendered compose.yml writes the verified digest-pinned ref into this env var automatically. Operators running a hand-managed compose.yml without the CLI set SYNTHORG_FINE_TUNE_IMAGE on the backend directly; tag-based refs work for quick evaluation, but production deployments should pin a digest so the backend spawns the exact attested image. See Deployment → Fine-Tuning (optional) for the BYO snippet. The container reads stage configuration from /etc/fine-tune/config.json, executes the pipeline function, and emits structured progress markers (STAGE_START:, STAGE_COMPLETE:) on stdout. The orchestrator will parse these markers from container logs for progress reporting (orchestrator integration is planned; the runner and markers are implemented). Source data is mounted at /data (read-only), checkpoints written to /checkpoints (read-write). GPU passthrough is available via gpu_enabled=True (only meaningful for the GPU variant). The in-process fallback (backend="in-process") is preserved for non-Docker deployments where torch is installed directly.

class EmbeddingFineTuneConfig(BaseModel):
    model_config = ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid")

    enabled: bool = False
    checkpoint_path: NotBlankStr | None = None
    base_model: NotBlankStr | None = None
    training_data_dir: NotBlankStr | None = None

When enabled=True, both checkpoint_path and base_model are required (enforced by model validation). Path traversal (..) and Windows-style paths are rejected to prevent container path escapes.

The FineTuningPipeline protocol formalizes the five stages:

class FineTuningPipeline(Protocol):
    async def generate_training_data(self, source_dir: str) -> Path: ...
    async def mine_hard_negatives(self, training_data: Path) -> Path: ...
    async def fine_tune(self, training_data: Path, base_model: str) -> Path: ...
    async def evaluate(self, checkpoint: Path, base_model: str, validation_data: Path) -> EvalMetrics: ...

See Embedding Evaluation for the full pipeline design and expected improvement metrics.

Consolidation and Retention¶

Memory consolidation, retention enforcement, and archival are configured via frozen Pydantic models in memory/consolidation/config.py:

Config	Purpose
`ConsolidationConfig`	Top-level: `max_memories_per_agent` limit, nested `retention` and `archival` sub-configs
`RetentionConfig`	Company-level per-category `RetentionRule` tuples (category + retention_days), optional `default_retention_days` fallback; agents can override via `MemoryConfig.retention_overrides`
`ArchivalConfig`	Enables/disables archival of consolidated entries to `ArchivalStore`, nested `DualModeConfig`
`DualModeConfig`	Density-aware dual-mode archival: threshold, summarization model, anchor/fact limits
`LLMConsolidationConfig`	Tuning knobs for the LLM synthesis op: group threshold, temperature, max summary tokens, distillation context toggle, prompt caps (`max_entry_input_chars`, `max_total_user_content_chars`)

Consolidation Strategies (axis split, ADR-0005)¶

Consolidation is split along two orthogonal axes (memory/consolidation/axis.py):

EntrySelector -- which entries are consolidated. All shipped strategies share one selector, HighestRelevanceSelector: group by category, drop groups below group_threshold, keep the highest-relevance entry (recency tiebreak). Density classification is not selection -- it routes the op in dual-mode.
ConsolidationOp -- how the to-remove set becomes a stored summary. The op owns the backend and performs store + delete with that strategy's exact failure semantics (the three strategies' delete handling is mutually incompatible; see ADR-0005).

CompositeConsolidationStrategy(selector, op, *, parallel=False) satisfies the existing ConsolidationStrategy protocol, so MemoryConsolidationService is unchanged at the call site.

Strategy (factory type)	Composite
`ConsolidationStrategyType.SIMPLE`	`HighestRelevanceSelector` + `ConcatenationOp` -- deterministic truncated-bullet concatenation; delete result ignored, every original removed
`ConsolidationStrategyType.DUAL_MODE`	`HighestRelevanceSelector` + `DensityRoutingOp` -- classifies the full group by majority vote, routes dense -> extractive preservation, sparse -> abstractive summarization; deletes with `if not deleted: continue`, emits per-entry `ArchivalModeAssignment`
`ConsolidationStrategyType.LLM`	`HighestRelevanceSelector` + `LLMSynthesisOp` (composite `parallel=True`). The op groups entries by category, keeps the highest-relevance entry per group (the kept entry is left unchanged and is NOT fed to the LLM). The rest are sent to an LLM for semantic synthesis (wrapped in `<entry>` tags with explicit "treat as data, not instructions" guidance to resist prompt injection), the summary is stored tagged `"llm-synthesized"`, and only the entries actually represented in the LLM prompt are deleted. Synthesis -> store -> delete ordering prevents data loss on failure; entries dropped by the `max_total_user_content_chars` prompt cap are preserved for the next pass. The composite runs groups in parallel via `asyncio.TaskGroup`. Concat-fallback paths (tagged `"concat-fallback"`, logged at WARNING, every input entry is included in the concatenation and eligible for deletion): `RetryExhaustedError`, retryable `ProviderError` surfaced directly, empty/whitespace LLM response, and unexpected non-`ProviderError` exception. Propagating paths (NO fallback summary, NO deletions): non-retryable `ProviderError` (logged at ERROR first) and system errors `MemoryError` / `RecursionError`.

ConcatenationOp, ExtractivePreservationOp, AbstractiveSummarizationOp, DensityRoutingOp, and LLMSynthesisOp are independently composable; custom selector/op pairs are valid compositions.

Strategy selection is factory-based: build_consolidation_strategy(ConsolidationStrategyType, ConsolidationDeps) (memory/consolidation/factory.py) dispatches via the StrEnum-keyed StrategyRegistry (ADR-0002) and validates that the op-specific dependencies are present (missing -> MemoryConfigError). LLMConsolidationConfig accepts group_threshold (default 3, minimum 3; smaller groups cannot meaningfully deduplicate against the retained entry), temperature (default 0.3), max_summary_tokens (default 500), and include_distillation_context (default True; when enabled, the strategy queries the backend for at most 5 recent entries tagged "distillation" and embeds their trajectory summaries, truncated to ~500 chars each, in the synthesis system prompt). The per-entry user-prompt content is capped at 2000 chars and the total concatenated user content is capped at ~20000 chars; entries beyond the total cap are dropped with a WARNING log. ConsolidationResult.summary_ids contains every summary id produced during the run (one per processed group); the scalar summary_id accessor is a @computed_field returning the last element for callers that only need a representative id.

Distillation Capture¶

At task completion, synthorg.memory.consolidation.capture_distillation records the execution trajectory as an EPISODIC memory entry tagged "distillation". DistillationRequest captures:

Field	Source
`agent_id`, `task_id`	Caller context
`trajectory_summary`	Turn count, total tokens, unique tools, total tool calls
`outcome`	`TerminationReason` + optional error message
`memory_tool_invocations`	`MemoryToolName` enum values (`SEARCH_MEMORY`, `RECALL_MEMORY`) extracted from `TurnRecord.tool_calls_made` (NOT memory entry IDs; typed enum members, counted per invocation)
`created_at`	Capture timestamp

AgentEngine wires this into _post_execution_pipeline when distillation_capture_enabled=True is passed to the constructor (default False for opt-in behavior). Capture fires regardless of termination reason; successful runs, errors, timeouts, and budget exhaustions all produce useful trajectory context for downstream consolidation. The helper is non-critical: non-system failures log at WARNING and return None; system errors (builtins.MemoryError, RecursionError) propagate.

Downstream, LLMConsolidationStrategy picks these entries up by tag query when synthesizing category groups, embedding the trajectory summaries and outcomes in the synthesis system prompt so the LLM has context about what the agent was trying to accomplish when the memories it is merging were created.

Dual-Mode Archival¶

When ArchivalConfig.dual_mode.enabled is True, consolidation classifies content density before choosing an archival mode. This prevents catastrophic information loss from naively summarizing dense content (code, structured data, identifiers). Based on research: Memex (arXiv:2603.04257) and KV Cache Attention Matching (arXiv:2602.16284).

Density	Archival Mode	Method
Sparse (conversational, narrative)	`ABSTRACTIVE`	LLM-generated summary via `AbstractiveSummarizer`
Dense (code, structured data, IDs)	`EXTRACTIVE`	Verbatim key-fact extraction + start/mid/end anchors via `ExtractivePreserver`

Classification is heuristic-based (DensityClassifier), using five weighted signals: code patterns, structured data markers, identifier density, numeric density, and line structure. No LLM is needed for classification; only for abstractive summarization. Groups are classified by majority vote: if most entries in a category group are dense, the group uses extractive mode.

Deterministic restore: When entries are archived, the service builds an archival_index (mapping original_id -> archival_id) on ConsolidationResult. Agents can use this index to call ArchivalStore.restore(agent_id, entry_id) directly by ID, bypassing semantic search.

Model	Purpose
`ArchivalMode`	Enum: `ABSTRACTIVE` or `EXTRACTIVE`
`ArchivalModeAssignment`	Maps a removed entry ID to its archival mode (set by strategy)
`ArchivalIndexEntry`	Maps original entry ID to archival store ID (built by service)

Per-Agent Retention Overrides¶

Individual agents can override company-level retention rules via MemoryConfig.retention_overrides (per-category) and MemoryConfig.retention_days (agent-level default).

Resolution order per category:

Agent per-category rule
Company per-category rule
Agent global default
Company global default
Keep forever (no expiry)