Memory & Persistence¶
The SynthOrg framework separates two distinct storage concerns:
- Agent memory -- what agents know, remember, and learn (working, episodic, semantic, procedural, social)
- Operational data -- tasks, cost records, messages, and audit logs generated during execution
Both are implemented behind pluggable protocol interfaces, making storage backends swappable via configuration without modifying application code.
Memory Architecture¶
+-------------------------------------------------+
| Agent Memory System |
+----------+----------+-----------+---------------+
| Working | Episodic | Semantic | Procedural |
| Memory | Memory | Memory | Memory |
| | | | |
| Current | Past | Knowledge | Skills & |
| task | events & | & facts | how-to |
| context | decisions| learned | |
+----------+----------+-----------+---------------+
| Storage Backend |
| Mem0 (initial, implemented) / Custom (future) |
| Qdrant (embedded) + SQLite history |
| See Decision Log |
+-------------------------------------------------+
Each agent maintains its own memory store. The storage backend is selected via configuration
and all access flows through the MemoryBackend protocol.
Memory Types¶
| Type | Scope | Persistence | Example |
|---|---|---|---|
| Working | Current task | None (in-context) | "I'm implementing the auth endpoint" |
| Episodic | Past events | Configurable | "Last sprint the team chose JWT over sessions" |
| Semantic | Knowledge | Long-term | "This project uses Litestar with aiosqlite" |
| Procedural | Skills/patterns | Long-term | "Code reviews require 2 approvals here" |
| Social | Relationships | Long-term | "The QA lead prefers detailed test plans" |
Memory Levels¶
Memory persistence is configurable per agent, from no persistence to fully persistent storage.
Memory Level Configuration
memory:
level: "persistent" # none | session | project | persistent (default: session)
backend: "mem0" # mem0 | custom | cognee | graphiti (future)
storage:
data_dir: "/data/memory" # mounted Docker volume path
vector_store: "qdrant" # hardcoded to embedded qdrant in Mem0 backend
history_store: "sqlite" # hardcoded to sqlite in Mem0 backend
options:
retention_days: null # null = forever
max_memories_per_agent: 10000
consolidation_interval: "daily" # compress old memories
shared_knowledge_base: true # agents can access shared facts
Shared Organizational Memory¶
Beyond individual agent memory, the framework provides organizational memory -- company-wide knowledge that all agents can access: policies, conventions, architecture decision records (ADRs), coding standards, and operational procedures. This is not personal episodic memory ("what I did last Tuesday") but institutional knowledge ("the team always uses Litestar, not Flask").
Shared organizational memory is implemented behind an OrgMemoryBackend protocol, making the
system highly modular and extensible. New backends can be added without modifying existing ones.
Backend 1: Hybrid Prompt + Retrieval (Default)¶
Critical rules (5--10 items, e.g., "no commits to main," "all PRs need 2 approvals") are injected into every agent's system prompt. Extended knowledge (ADRs, detailed procedures, style guides) is stored in a queryable store and retrieved on demand at task start.
org_memory:
backend: "hybrid_prompt_retrieval" # hybrid_prompt_retrieval, graph_rag, temporal_kg
core_policies: # always in system prompt
- "All code must have 80%+ test coverage"
- "Use Litestar, not Flask"
- "PRs require 2 approvals"
extended_store:
backend: "sqlite" # sqlite, postgresql
max_retrieved_per_query: 5
write_access:
policies: ["human"] # only humans write core policies
adrs: ["human", "senior", "lead", "c_suite"]
procedures: ["human", "senior", "lead", "c_suite"]
Strengths: Simple to implement. Core rules are always present. Extended knowledge scales with the organization.
Limitations: Basic retrieval may miss relational connections between policies.
Research Directions¶
The following backends illustrate why OrgMemoryBackend is a protocol -- the architecture
supports future upgrades without modifying existing code. These are research directions that
may inform future work if organizational memory needs outgrow the Hybrid Prompt + Retrieval
approach.
Research Direction: GraphRAG Knowledge Graph
Organizational knowledge stored as entities + relationships in a knowledge graph. Agents query via graph traversal, enabling multi-hop reasoning: "Litestar is the standard" is linked to "don't use Flask," which is linked to "exception: data team uses Django for admin."
org_memory:
backend: "graph_rag"
graph:
store: "sqlite" # graph stored in relational DB, or dedicated graph DB
entity_extraction: "auto" # auto-extract entities from ADRs and policies
Strengths: Significant accuracy improvement over vector-only retrieval (some benchmarks report 3--4x gains). Multi-hop reasoning captures policy relationships.
Limitations: More complex infrastructure. Entity extraction can be noisy. Heavier setup.
Research Direction: Temporal Knowledge Graph
Like GraphRAG but tracks how facts change over time. "The team used Flask until March 2026, then switched to Litestar." Agents see current truth but can query history for context.
org_memory:
backend: "temporal_kg"
temporal:
track_changes: true
history_retention_days: null # null = forever
Strengths: Handles policy evolution naturally. Agents understand when and why things changed.
Limitations: Most complex. Potentially overkill for small organizations or local-first use.
OrgMemoryBackend Protocol¶
All backends implement the OrgMemoryBackend protocol:
query(OrgMemoryQuery) -> tuple[OrgFact, ...]write(OrgFactWriteRequest, *, author: OrgFactAuthor) -> NotBlankStrlist_policies() -> tuple[OrgFact, ...]- Lifecycle methods:
connect,disconnect,health_check,is_connected,backend_name
The MVP ships with Backend 1 (Hybrid Prompt + Retrieval). The selected memory layer backend Mem0 (Decision Log) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for the research direction backends.
Write Access Control
Core policies are human-only. ADRs and procedures can be written by senior+ agents. All writes are append-only and auditable. This prevents agents from corrupting shared organizational knowledge while allowing senior agents to document decisions.
Memory Backend Protocol¶
Agent memory is implemented behind a pluggable MemoryBackend protocol (Mem0 initial, custom
stack future -- see Decision Log). Application code depends only on the protocol; the storage engine is an
implementation detail swappable via config.
Enums¶
| Enum | Values | Purpose |
|---|---|---|
MemoryCategory |
WORKING, EPISODIC, SEMANTIC, PROCEDURAL, SOCIAL | Memory type categories |
MemoryLevel |
PERSISTENT, PROJECT, SESSION, NONE | Persistence level per agent |
ConsolidationInterval |
HOURLY, DAILY, WEEKLY, NEVER | How often old memories are compressed |
MemoryBackend Protocol¶
@runtime_checkable
class MemoryBackend(Protocol):
"""Lifecycle + CRUD for agent memory storage."""
async def connect(self) -> None: ...
async def disconnect(self) -> None: ...
async def health_check(self) -> bool: ...
@property
def is_connected(self) -> bool: ...
@property
def backend_name(self) -> NotBlankStr: ...
async def store(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
"""Raises: MemoryConnectionError, MemoryStoreError."""
...
async def retrieve(self, agent_id: NotBlankStr, query: MemoryQuery) -> tuple[MemoryEntry, ...]:
"""Raises: MemoryConnectionError, MemoryRetrievalError."""
...
async def get(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> MemoryEntry | None:
"""Raises: MemoryConnectionError, MemoryRetrievalError."""
...
async def delete(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
"""Raises: MemoryConnectionError, MemoryStoreError."""
...
async def count(self, agent_id: NotBlankStr, *, category: MemoryCategory | None = None) -> int:
"""Raises: MemoryConnectionError, MemoryRetrievalError."""
...
MemoryCapabilities Protocol¶
Backends that implement MemoryCapabilities expose what features they support, enabling
runtime capability checks before attempting operations.
@runtime_checkable
class MemoryCapabilities(Protocol):
"""Capability discovery for memory backends."""
@property
def supported_categories(self) -> frozenset[MemoryCategory]: ...
@property
def supports_graph(self) -> bool: ...
@property
def supports_temporal(self) -> bool: ...
@property
def supports_vector_search(self) -> bool: ...
@property
def supports_shared_access(self) -> bool: ...
@property
def max_memories_per_agent(self) -> int | None: ...
SharedKnowledgeStore Protocol¶
Backends that support cross-agent shared knowledge implement this protocol alongside
MemoryBackend. Not all backends require cross-agent queries -- this keeps the base protocol
clean.
@runtime_checkable
class SharedKnowledgeStore(Protocol):
"""Cross-agent shared knowledge operations."""
async def publish(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
"""Raises: MemoryConnectionError, MemoryStoreError."""
...
async def search_shared(self, query: MemoryQuery, *, exclude_agent: NotBlankStr | None = None) -> tuple[MemoryEntry, ...]:
"""Raises: MemoryConnectionError, MemoryRetrievalError."""
...
async def retract(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
"""Raises: MemoryConnectionError, MemoryStoreError."""
...
Error Hierarchy¶
All memory errors inherit from MemoryError so callers can catch the entire family with a
single except clause.
| Error | When Raised |
|---|---|
MemoryError |
Base exception for all memory operations |
MemoryConnectionError |
Backend connection cannot be established or is lost |
MemoryStoreError |
A store or delete operation fails |
MemoryRetrievalError |
A retrieve, search, or count operation fails |
MemoryNotFoundError |
A specific memory ID is not found |
MemoryConfigError |
Memory configuration is invalid |
MemoryCapabilityError |
An unsupported operation is attempted for a backend |
Configuration¶
memory:
backend: "mem0"
level: "persistent" # none, session, project, persistent (default: session)
storage:
data_dir: "/data/memory"
vector_store: "qdrant" # hardcoded to embedded qdrant in Mem0 backend
history_store: "sqlite" # hardcoded to sqlite in Mem0 backend
options:
retention_days: null # null = forever
max_memories_per_agent: 10000
consolidation_interval: "daily"
shared_knowledge_base: true
# Embedder config is passed programmatically via the factory:
# create_memory_backend(config, embedder=Mem0EmbedderConfig(
# provider="<embedding-provider>",
# model="<embedding-model-id>",
# dims=1536,
# ))
Configuration is modeled by CompanyMemoryConfig (top-level), MemoryStorageConfig
(storage paths/backends), and MemoryOptionsConfig (behaviour tuning). All are frozen
Pydantic models. The create_memory_backend(config, *, embedder=...) factory returns an
isolated MemoryBackend instance per company. The embedder kwarg is required for the
Mem0 backend (must be a Mem0EmbedderConfig).
Embedding Model Selection¶
Embedding model quality directly determines memory retrieval accuracy. The
LMEB benchmark (Zhao et al., March 2026) evaluates embedding
models on long-horizon memory retrieval across four types that map directly to SynthOrg's
MemoryCategory enum:
| SynthOrg Category | LMEB Category | Evaluation Priority |
|---|---|---|
| EPISODIC | Episodic (69 tasks) | High |
| PROCEDURAL | Procedural (67 tasks) | High |
| SEMANTIC | Semantic (15 tasks) | Medium |
| SOCIAL | Dialogue (42 tasks) | Medium |
| WORKING | N/A (in-context) | N/A |
MTEB scores do not predict memory retrieval quality (Pearson: -0.115, Spearman: -0.130). Embedding model selection must be evaluated on LMEB, not MTEB. See Decision Log and the Embedding Evaluation reference page for the full analysis, model rankings, and deployment tier recommendations.
Key findings:
- Larger models do not always outperform smaller ones on memory retrieval
- Dialogue/social memory is the hardest retrieval category for all models
- Instruction sensitivity varies per model -- must be validated per deployment
- Three deployment tiers are recommended: full-resource (7-12B), mid-resource (1-4B), and CPU-only (< 1B)
Research Direction: Domain-Specific Embedding Fine-Tuning
Domain-specific fine-tuning can improve retrieval quality by 10-27% over base models (NVIDIA evaluation). The pipeline requires no manual annotation and runs on a single GPU.
Pipeline stages:
- Synthetic data generation -- LLM generates query-document pairs from org documents (policies, ADRs, procedures, coding standards)
- Hard negative mining -- base model embeds all passages; top-k semantically similar but non-matching passages become hard negatives
- Contrastive fine-tuning -- biencoder training with InfoNCE loss (tau=0.02, 3 epochs, lr=1e-5). Single GPU, 1-2 hours for ~500 documents
- Deploy -- save checkpoint; update
Mem0EmbedderConfigto point to fine-tuned model
Integration design: fine-tuning is an offline pipeline triggered via
POST /admin/memory/fine-tune (see MemoryAdminController). The optional
EmbeddingFineTuneConfig (disabled by default) stores the checkpoint path. When
enabled=True and checkpoint_path is set, backend initialization uses the
checkpoint path as the model identifier passed to the Mem0 SDK. The embedding
provider must serve the fine-tuned model under this identifier.
class EmbeddingFineTuneConfig(BaseModel):
model_config = ConfigDict(frozen=True, allow_inf_nan=False)
enabled: bool = False
checkpoint_path: NotBlankStr | None = None
base_model: NotBlankStr | None = None
training_data_dir: NotBlankStr | None = None
When enabled=True, both checkpoint_path and base_model are required
(enforced by model validation). Path traversal (..) and Windows-style
paths are rejected to prevent container path escapes.
A future FineTuningPipeline protocol would formalize the four stages:
class FineTuningPipeline(Protocol):
async def generate_training_data(self, source_dir: str) -> Path: ...
async def mine_hard_negatives(self, training_data: Path) -> Path: ...
async def fine_tune(self, training_data: Path, base_model: str) -> Path: ...
See Embedding Evaluation for the full pipeline design and expected improvement metrics.
Consolidation and Retention¶
Memory consolidation, retention enforcement, and archival are configured via frozen Pydantic
models in memory/consolidation/config.py:
| Config | Purpose |
|---|---|
ConsolidationConfig |
Top-level: max_memories_per_agent limit, nested retention and archival sub-configs |
RetentionConfig |
Company-level per-category RetentionRule tuples (category + retention_days), optional default_retention_days fallback; agents can override via MemoryConfig.retention_overrides |
ArchivalConfig |
Enables/disables archival of consolidated entries to ArchivalStore, nested DualModeConfig |
DualModeConfig |
Density-aware dual-mode archival: threshold, summarization model, anchor/fact limits |
Dual-Mode Archival¶
When ArchivalConfig.dual_mode.enabled is True, consolidation classifies content density before
choosing an archival mode. This prevents catastrophic information loss from naively summarizing
dense content (code, structured data, identifiers). Based on research: Memex
(arXiv:2603.04257) and KV Cache Attention Matching
(arXiv:2602.16284).
| Density | Archival Mode | Method |
|---|---|---|
| Sparse (conversational, narrative) | ABSTRACTIVE |
LLM-generated summary via AbstractiveSummarizer |
| Dense (code, structured data, IDs) | EXTRACTIVE |
Verbatim key-fact extraction + start/mid/end anchors via ExtractivePreserver |
Classification is heuristic-based (DensityClassifier), using five weighted signals: code
patterns, structured data markers, identifier density, numeric density, and line structure. No LLM
is needed for classification -- only for abstractive summarization. Groups are classified by
majority vote: if most entries in a category group are dense, the group uses extractive mode.
Deterministic restore: When entries are archived, the service builds an archival_index
(mapping original_id → archival_id) on ConsolidationResult. Agents can use this index to
call ArchivalStore.restore(agent_id, entry_id) directly by ID, bypassing semantic search.
| Model | Purpose |
|---|---|
ArchivalMode |
Enum: ABSTRACTIVE or EXTRACTIVE |
ArchivalModeAssignment |
Maps a removed entry ID to its archival mode (set by strategy) |
ArchivalIndexEntry |
Maps original entry ID to archival store ID (built by service) |
Per-Agent Retention Overrides¶
Individual agents can override company-level retention rules via
MemoryConfig.retention_overrides (per-category) and
MemoryConfig.retention_days (agent-level default).
Resolution order per category:
- Agent per-category rule
- Company per-category rule
- Agent global default
- Company global default
- Keep forever (no expiry)
Operational Data Persistence¶
Agent memory is handled by the MemoryBackend protocol (Mem0 initial, custom stack future --
see Decision Log). Operational data -- tasks, cost records, messages, audit logs -- is a separate
concern managed by a pluggable PersistenceBackend protocol. Application code depends only on
repository protocols; the storage engine is an implementation detail swappable via config.
Architecture¶
+------------------------------------------------------------------+
| Application Code |
| engine/ budget/ communication/ security/ |
| | | | | |
| v v v v |
| +------+ +------+ +----------+ +----------+ |
| | Task | | Cost | | Message | | Audit | <-- Repository |
| | Repo | | Repo | | Repo | | Repo | Protocols |
| +--+---+ +--+---+ +----+-----+ +----+-----+ |
| +--------+----------+------------+ |
| | |
| +-------------------+-------------------------------------------+
| | PersistenceBackend (protocol) |
| | connect() . disconnect() . health_check() . migrate() |
| +-------------------+-------------------------------------------+
| | |
| +-------------------+-------------------------------------------+
| | SQLitePersistenceBackend (initial) |
| | PostgresPersistenceBackend (future) |
| | MariaDBPersistenceBackend (future) |
| +---------------------------------------------------------------+
+------------------------------------------------------------------+
Protocol Design¶
@runtime_checkable
class PersistenceBackend(Protocol):
"""Lifecycle management for operational data storage."""
async def connect(self) -> None: ...
async def disconnect(self) -> None: ...
async def health_check(self) -> bool: ...
async def migrate(self) -> None: ...
@property
def is_connected(self) -> bool: ...
@property
def backend_name(self) -> NotBlankStr: ...
@property
def tasks(self) -> TaskRepository: ...
@property
def cost_records(self) -> CostRecordRepository: ...
@property
def messages(self) -> MessageRepository: ...
# ... plus lifecycle_events, task_metrics, collaboration_metrics,
# parked_contexts, audit_entries, users, api_keys, checkpoints,
# heartbeats, agent_states, settings, artifacts, projects,
# custom_presets
Each entity type has its own repository protocol:
@runtime_checkable
class TaskRepository(Protocol):
"""CRUD + query interface for Task persistence."""
async def save(self, task: Task) -> None: ...
async def get(self, task_id: str) -> Task | None: ...
async def list_tasks(self, *, status: TaskStatus | None = None, assigned_to: str | None = None, project: str | None = None) -> tuple[Task, ...]: ...
async def delete(self, task_id: str) -> bool: ...
@runtime_checkable
class CostRecordRepository(Protocol):
"""CRUD + aggregation interface for CostRecord persistence."""
async def save(self, record: CostRecord) -> None: ...
async def query(self, *, agent_id: str | None = None, task_id: str | None = None) -> tuple[CostRecord, ...]: ...
async def aggregate(self, *, agent_id: str | None = None) -> float: ...
@runtime_checkable
class MessageRepository(Protocol):
"""CRUD + query interface for Message persistence."""
async def save(self, message: Message) -> None: ...
async def get_history(self, channel: str, *, limit: int | None = None) -> tuple[Message, ...]: ...
Configuration¶
persistence:
backend: "sqlite" # sqlite, postgresql, mariadb (future)
sqlite:
path: "/data/synthorg.db" # database file path (mounted volume in Docker)
wal_mode: true # WAL for concurrent read performance
journal_size_limit: 67108864 # 64 MB WAL journal limit
# postgresql: # future
# url: "postgresql://user:pass@host:5432/synthorg"
# pool_size: 10
# mariadb: # future
# url: "mariadb://user:pass@host:3306/synthorg"
# pool_size: 10
Entities Persisted¶
| Entity | Source Module | Repository | Key Queries |
|---|---|---|---|
Task |
core/task.py |
TaskRepository |
by status, by assignee, by project |
CostRecord |
budget/cost_record.py |
CostRecordRepository |
by agent, by task, aggregations |
Message |
communication/message.py |
MessageRepository |
by channel |
AuditEntry |
security/models.py |
AuditRepository |
by agent, by action type, by verdict, by risk level, time range |
ParkedContext |
security/timeout/parked_context.py |
ParkedContextRepository |
by execution_id, by agent_id, by task_id |
AgentRuntimeState |
engine/agent_state.py |
AgentStateRepository |
by agent_id, active agents |
| Setting | settings/models.py |
SettingsRepository |
by namespace+key, by namespace, all |
Artifact |
core/artifact.py |
ArtifactRepository |
by task_id, by created_by, by artifact_type |
Project |
core/project.py |
ProjectRepository |
by status, by lead |
| Custom preset | templates/preset_service.py |
PersonalityPresetRepository |
by name |
Schema Strategy¶
- Schema is applied at startup via
PersistenceBackend.migrate()which callsapply_schema() - The canonical schema lives in
src/synthorg/persistence/sqlite/schema.sql(single source of truth) - All DDL uses
IF NOT EXISTSguards, making application idempotent - No sequential migrations exist yet -- when data stability is declared, adopt Atlas for declarative migrations (diff
schema.sqlagainst the live DB)
Key Principles¶
- Application code never imports a concrete backend
- Only repository protocols are used. This ensures complete decoupling from the storage engine.
- Adding a new backend requires no changes to consumers
- Implement
PersistenceBackend+ all repository protocols. Existing application code works unchanged. - Same entity models everywhere
- Repositories accept and return the existing frozen Pydantic models (
Task,CostRecord,Message). No ORM models or data transfer objects. - Async throughout
- All repository methods are async, matching the framework's concurrency model.
Multi-Tenancy¶
Each company gets its own database. The PersistenceConfig embedded in a company's RootConfig
specifies the backend type and connection details (e.g., a unique SQLite file path or PostgreSQL
database URL). The create_backend(config) factory returns an isolated PersistenceBackend
instance per company -- no shared state, no cross-company data leakage.
# One database per company -- configured in each company's YAML
company_a_backend = create_backend(company_a_config.persistence)
company_b_backend = create_backend(company_b_config.persistence)
# Each backend has independent lifecycle: connect -> migrate -> use -> disconnect
Planned
Runtime backend switching (e.g., migrating a company from SQLite to PostgreSQL during operation) is a planned future capability. The protocol-based design already supports this -- the engine would disconnect the current backend, connect a new one with different config, and migrate. Implementation details (data migration tooling, zero-downtime switchover, connection draining) are deferred to the PostgreSQL backend implementation.
Procedural Memory Auto-Generation¶
When an agent fails a task, the engine's post-execution pipeline can automatically generate a procedural memory entry -- a structured "next time, do X when encountering Y" lesson learned. This follows the EvoSkill three-agent separation principle: the failed agent does not write its own lesson; a separate proposer LLM call analyses the failure.
Pipeline¶
-
Failure analysis payload (
FailureAnalysisPayload): Built fromRecoveryResult+ExecutionResult. Includes task metadata, sanitized error message, tool calls made, retry count, and turn count. Deliberately excludes raw conversation messages (privacy boundary). -
Proposer LLM call (
ProceduralMemoryProposer): A separate completion call with its own system prompt analyses the payload and returns a structuredProceduralMemoryProposal. -
Three-tier progressive disclosure:
- Discovery (~100 tokens): concise summary for retrieval ranking.
- Activation (condition + action + rationale): when/what/why.
- Execution (ordered steps): concrete steps for applying the knowledge.
-
Storage: The proposal is stored via
MemoryBackend.store()as aMemoryCategory.PROCEDURALentry with"non-inferable"tag for retrieval filtering. -
SKILL.md materialization (optional): When
ProceduralMemoryConfig.skill_md_directoryis set, the proposal is also written as a portable SKILL.md file following the Agent Skills format for git-native versioning.
Configuration¶
ProceduralMemoryConfig (nested in CompanyMemoryConfig.procedural) controls:
enabled: Toggle auto-generation on/off (default:True).model: Model identifier for the proposer LLM call (default:"example-small-001").temperature: Sampling temperature (default:0.3).max_tokens: Token budget for the proposer response (default:1500).min_confidence: Discard proposals below this threshold (default:0.5).skill_md_directory: Optional path for SKILL.md file materialization.
Integration Point¶
AgentEngine._try_procedural_memory() runs after error recovery in
_post_execution_pipeline. It is non-critical: failures are logged at WARNING
and never block the execution result.
Memory Injection Strategies¶
Agent memory reaches agents through pluggable injection strategies behind the
MemoryInjectionStrategy protocol. The strategy determines how memories are surfaced to
the agent during execution.
Pre-retrieves relevant memories before execution, ranks by relevance and recency, enforces
a token budget, and formats memories as ChatMessage(s) injected between the system prompt
and task instruction. The agent passively receives memories.
Pipeline (Linear -- single-source, default):
MemoryBackend.retrieve()-- fetch candidate memories (dense vector search)- Rank by relevance + recency via linear combination
- Filter by
min_relevancethreshold - Apply
MemoryFilterStrategy(Decision Log D23, optional) -- exclude inferable content - Greedy token-budget packing
- Format as
ChatMessage(configured role: SYSTEM or USER) with delimiters
Pipeline (RRF hybrid search -- multi-source):
When fusion_strategy: rrf is configured, the pipeline runs both dense and BM25 sparse
search in parallel and fuses results:
- Dense search:
MemoryBackend.retrieve()for personal,SharedKnowledgeStore.search_shared()for shared (in parallel) - Sparse BM25 search:
MemoryBackend.retrieve_sparse()for personal (shared sparse disabled untilSharedKnowledgeStoreadds the method) - Fuse via
fuse_ranked_lists()with configurablerrf_ksmoothing constant - Post-RRF
min_relevancefilter oncombined_score - Apply
MemoryFilterStrategy(optional) - Greedy token-budget packing
- Format as
ChatMessage
BM25 sparse vectors are stored alongside dense vectors in Qdrant using a named sparse
vector field with Modifier.IDF (Qdrant applies IDF server-side). The BM25Tokenizer
uses murmurhash3 for vocabulary-free token-to-index mapping; only term frequencies are
stored. Sparse search is opt-in via Mem0BackendConfig.sparse_search_enabled.
Shared memories (from SharedKnowledgeStore) are fetched in parallel, merged with personal
memories (no personal_boost for shared), and ranked together.
Ranking Algorithm (Linear -- default):
relevance = entry.relevance_score ?? config.default_relevance- Personal entries:
relevance = min(relevance + personal_boost, 1.0) recency = exp(-decay_rate * age_hours)combined = relevance_weight * relevance + recency_weight * recency- Filter:
combined >= min_relevance - Sort descending by
combined_score
Alternative: Reciprocal Rank Fusion (RRF)
When fusion_strategy: rrf is configured, multiple pre-ranked lists (e.g., from different
retrieval sources) are merged via RRF: score(doc) = sum(1 / (k + rank_i)) across all
lists containing the document. Scores are min-max normalized to [0.0, 1.0]. The smoothing
constant k (default 60, configurable via rrf_k) controls rank-difference amplification.
RRF is the de facto standard for hybrid search fusion
(Qdrant,
NeMo Retriever). It is
intended for multi-source scenarios (BM25 + vector, multi-round tool-based retrieval); the
linear strategy remains the default for single-source retrieval. Results are truncated to
max_results (default 20) after scoring and sorting.
Non-Inferable Filter
Retrieved memories are filtered before injection to exclude content the agent can discover by reading the codebase or environment. Only non-inferable information is injected: prior decisions, learned conventions, interpersonal context, historical outcomes. Research shows generic context increases cost 20%+ with minimal success improvement; LLM-generated context can actually reduce success rates.
Filter strategy (Decision Log D23): Pluggable MemoryFilterStrategy protocol. Initial
implementation uses tag-based filtering at write time. A non-inferable tag convention
with advisory validation at the MemoryBackend.store() boundary warns on missing tags
but never blocks. The system prompt instructs agents what qualifies as non-inferable:
design rationale, team decisions, "why not X," cross-repo knowledge. Uses existing
MemoryMetadata.tags and MemoryQuery.tags -- zero new models needed.
The agent has recall_memory / search_memory tools it calls on-demand during execution.
The agent actively decides when and what to remember. More token-efficient (only retrieves
when needed) but consumes tool-call turns and requires agent discipline to invoke.
Implemented via ToolBasedInjectionStrategy. The strategy:
- Injects a brief system instruction about available memory tools
- Exposes
search_memoryandrecall_memory(by ID) tools - Delegates
search_memoryrequests toMemoryBackend.retrieve()(dense-only; hybrid dense+sparse with RRF fusion is not yet wired into the tool-based path) - Hybrid retrieval and RRF fusion are handled at the
ContextInjectionStrategylevel, not withinToolBasedInjectionStrategy QueryReformulatorandSufficiencyCheckerprotocols exist with LLM-based implementations, but iterative reformulation is not yet wired into the tool-based strategy's search handler (reserved viaquery_reformulation_enabledconfig field)
ToolRegistry integration: SearchMemoryTool and RecallMemoryTool are BaseTool
subclasses (memory/tools.py) that delegate execution to
ToolBasedInjectionStrategy.handle_tool_call(). The registry_with_memory_tools()
factory augments a ToolRegistry with these tools when the strategy is
ToolBasedInjectionStrategy. AgentEngine accepts an optional
memory_injection_strategy parameter and wires the tools into each agent's registry
at execution time. This ensures memory tools participate in the standard ToolInvoker
dispatch pipeline, including permission checking (ToolCategory.MEMORY), security
interceptors, and invocation tracking.
MCP bridge evaluation: Both context injection and tool-based strategies hold direct
MemoryBackend references and run in-process. The memory hot path already bypasses MCP
by design -- no additional optimization needed.
The agent has structured memory blocks (core, archival, recall) it reads AND writes during execution via dedicated tools. Core memory is always in context; archival and recall are searched via tools. Most sophisticated (self-editing memory architecture) but highest complexity and LLM overhead.
MemoryInjectionStrategy Protocol¶
All strategies implement MemoryInjectionStrategy:
class MemoryInjectionStrategy(Protocol):
async def prepare_messages(
self, agent_id: NotBlankStr, query_text: NotBlankStr, token_budget: int
) -> tuple[ChatMessage, ...]: ...
def get_tool_definitions(self) -> tuple[ToolDefinition, ...]: ...
@property
def strategy_name(self) -> str: ...
Strategy selection via config: memory.retrieval.strategy: context | tool_based | self_editing