Skip to content

Memory & Persistence

The SynthOrg framework separates two distinct storage concerns:

  • Agent memory -- what agents know, remember, and learn (working, episodic, semantic, procedural, social)
  • Operational data -- tasks, cost records, messages, and audit logs generated during execution

Both are implemented behind pluggable protocol interfaces, making storage backends swappable via configuration without modifying application code.


Memory Architecture

+-------------------------------------------------+
|              Agent Memory System                |
+----------+----------+-----------+---------------+
| Working  | Episodic | Semantic  | Procedural    |
| Memory   | Memory   | Memory    | Memory        |
|          |          |           |               |
| Current  | Past     | Knowledge | Skills &      |
| task     | events & | & facts   | how-to        |
| context  | decisions| learned   |               |
+----------+----------+-----------+---------------+
|            Storage Backend                      |
|   Mem0 (initial, implemented) / Custom (future) |
|   Qdrant (embedded) + SQLite history             |
|     See Decision Log                             |
+-------------------------------------------------+

Each agent maintains its own memory store. The storage backend is selected via configuration and all access flows through the MemoryBackend protocol.


Memory Types

Type Scope Persistence Example
Working Current task None (in-context) "I'm implementing the auth endpoint"
Episodic Past events Configurable "Last sprint the team chose JWT over sessions"
Semantic Knowledge Long-term "This project uses Litestar with aiosqlite"
Procedural Skills/patterns Long-term "Code reviews require 2 approvals here"
Social Relationships Long-term "The QA lead prefers detailed test plans"

Memory Levels

Memory persistence is configurable per agent, from no persistence to fully persistent storage.

Memory Level Configuration
memory:
  level: "persistent"            # none | session | project | persistent (default: session)
  backend: "mem0"               # mem0 | custom | cognee | graphiti (future)
  storage:
    data_dir: "/data/memory"    # mounted Docker volume path
    vector_store: "qdrant"      # hardcoded to embedded qdrant in Mem0 backend
    history_store: "sqlite"     # hardcoded to sqlite in Mem0 backend
  options:
    retention_days: null         # null = forever
    max_memories_per_agent: 10000
    consolidation_interval: "daily"  # compress old memories
    shared_knowledge_base: true      # agents can access shared facts

Shared Organizational Memory

Beyond individual agent memory, the framework provides organizational memory -- company-wide knowledge that all agents can access: policies, conventions, architecture decision records (ADRs), coding standards, and operational procedures. This is not personal episodic memory ("what I did last Tuesday") but institutional knowledge ("the team always uses Litestar, not Flask").

Shared organizational memory is implemented behind an OrgMemoryBackend protocol, making the system highly modular and extensible. New backends can be added without modifying existing ones.

Backend 1: Hybrid Prompt + Retrieval (Default)

Critical rules (5--10 items, e.g., "no commits to main," "all PRs need 2 approvals") are injected into every agent's system prompt. Extended knowledge (ADRs, detailed procedures, style guides) is stored in a queryable store and retrieved on demand at task start.

org_memory:
  backend: "hybrid_prompt_retrieval"    # hybrid_prompt_retrieval, graph_rag, temporal_kg
  core_policies:                        # always in system prompt
    - "All code must have 80%+ test coverage"
    - "Use Litestar, not Flask"
    - "PRs require 2 approvals"
  extended_store:
    backend: "sqlite"                   # sqlite, postgresql
    max_retrieved_per_query: 5
  write_access:
    policies: ["human"]                 # only humans write core policies
    adrs: ["human", "senior", "lead", "c_suite"]
    procedures: ["human", "senior", "lead", "c_suite"]

Strengths: Simple to implement. Core rules are always present. Extended knowledge scales with the organization.

Limitations: Basic retrieval may miss relational connections between policies.

Research Directions

The following backends illustrate why OrgMemoryBackend is a protocol -- the architecture supports future upgrades without modifying existing code. These are research directions that may inform future work if organizational memory needs outgrow the Hybrid Prompt + Retrieval approach.

Research Direction: GraphRAG Knowledge Graph

Organizational knowledge stored as entities + relationships in a knowledge graph. Agents query via graph traversal, enabling multi-hop reasoning: "Litestar is the standard" is linked to "don't use Flask," which is linked to "exception: data team uses Django for admin."

org_memory:
  backend: "graph_rag"
  graph:
    store: "sqlite"                     # graph stored in relational DB, or dedicated graph DB
    entity_extraction: "auto"           # auto-extract entities from ADRs and policies

Strengths: Significant accuracy improvement over vector-only retrieval (some benchmarks report 3--4x gains). Multi-hop reasoning captures policy relationships.

Limitations: More complex infrastructure. Entity extraction can be noisy. Heavier setup.

Research Direction: Temporal Knowledge Graph

Like GraphRAG but tracks how facts change over time. "The team used Flask until March 2026, then switched to Litestar." Agents see current truth but can query history for context.

org_memory:
  backend: "temporal_kg"
  temporal:
    track_changes: true
    history_retention_days: null        # null = forever

Strengths: Handles policy evolution naturally. Agents understand when and why things changed.

Limitations: Most complex. Potentially overkill for small organizations or local-first use.

OrgMemoryBackend Protocol

All backends implement the OrgMemoryBackend protocol:

  • query(OrgMemoryQuery) -> tuple[OrgFact, ...]
  • write(OrgFactWriteRequest, *, author: OrgFactAuthor) -> NotBlankStr
  • list_policies() -> tuple[OrgFact, ...]
  • Lifecycle methods: connect, disconnect, health_check, is_connected, backend_name

The MVP ships with Backend 1 (Hybrid Prompt + Retrieval). The selected memory layer backend Mem0 (Decision Log) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for the research direction backends.

Write Access Control

Core policies are human-only. ADRs and procedures can be written by senior+ agents. All writes are append-only and auditable. This prevents agents from corrupting shared organizational knowledge while allowing senior agents to document decisions.


Memory Backend Protocol

Agent memory is implemented behind a pluggable MemoryBackend protocol (Mem0 initial, custom stack future -- see Decision Log). Application code depends only on the protocol; the storage engine is an implementation detail swappable via config.

Enums

Enum Values Purpose
MemoryCategory WORKING, EPISODIC, SEMANTIC, PROCEDURAL, SOCIAL Memory type categories
MemoryLevel PERSISTENT, PROJECT, SESSION, NONE Persistence level per agent
ConsolidationInterval HOURLY, DAILY, WEEKLY, NEVER How often old memories are compressed

MemoryBackend Protocol

@runtime_checkable
class MemoryBackend(Protocol):
    """Lifecycle + CRUD for agent memory storage."""

    async def connect(self) -> None: ...
    async def disconnect(self) -> None: ...
    async def health_check(self) -> bool: ...

    @property
    def is_connected(self) -> bool: ...
    @property
    def backend_name(self) -> NotBlankStr: ...

    async def store(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def retrieve(self, agent_id: NotBlankStr, query: MemoryQuery) -> tuple[MemoryEntry, ...]:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def get(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> MemoryEntry | None:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def delete(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def count(self, agent_id: NotBlankStr, *, category: MemoryCategory | None = None) -> int:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...

MemoryCapabilities Protocol

Backends that implement MemoryCapabilities expose what features they support, enabling runtime capability checks before attempting operations.

@runtime_checkable
class MemoryCapabilities(Protocol):
    """Capability discovery for memory backends."""

    @property
    def supported_categories(self) -> frozenset[MemoryCategory]: ...
    @property
    def supports_graph(self) -> bool: ...
    @property
    def supports_temporal(self) -> bool: ...
    @property
    def supports_vector_search(self) -> bool: ...
    @property
    def supports_shared_access(self) -> bool: ...
    @property
    def max_memories_per_agent(self) -> int | None: ...

SharedKnowledgeStore Protocol

Backends that support cross-agent shared knowledge implement this protocol alongside MemoryBackend. Not all backends require cross-agent queries -- this keeps the base protocol clean.

@runtime_checkable
class SharedKnowledgeStore(Protocol):
    """Cross-agent shared knowledge operations."""

    async def publish(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def search_shared(self, query: MemoryQuery, *, exclude_agent: NotBlankStr | None = None) -> tuple[MemoryEntry, ...]:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def retract(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...

Error Hierarchy

All memory errors inherit from MemoryError so callers can catch the entire family with a single except clause.

Error When Raised
MemoryError Base exception for all memory operations
MemoryConnectionError Backend connection cannot be established or is lost
MemoryStoreError A store or delete operation fails
MemoryRetrievalError A retrieve, search, or count operation fails
MemoryNotFoundError A specific memory ID is not found
MemoryConfigError Memory configuration is invalid
MemoryCapabilityError An unsupported operation is attempted for a backend

Configuration

memory:
  backend: "mem0"
  level: "persistent"              # none, session, project, persistent (default: session)
  storage:
    data_dir: "/data/memory"
    vector_store: "qdrant"          # hardcoded to embedded qdrant in Mem0 backend
    history_store: "sqlite"         # hardcoded to sqlite in Mem0 backend
  options:
    retention_days: null            # null = forever
    max_memories_per_agent: 10000
    consolidation_interval: "daily"
    shared_knowledge_base: true

# Embedder config is passed programmatically via the factory:
#   create_memory_backend(config, embedder=Mem0EmbedderConfig(
#       provider="<embedding-provider>",
#       model="<embedding-model-id>",
#       dims=1536,
#   ))

Configuration is modeled by CompanyMemoryConfig (top-level), MemoryStorageConfig (storage paths/backends), and MemoryOptionsConfig (behaviour tuning). All are frozen Pydantic models. The create_memory_backend(config, *, embedder=...) factory returns an isolated MemoryBackend instance per company. The embedder kwarg is required for the Mem0 backend (must be a Mem0EmbedderConfig).

Embedding Model Selection

Embedding model quality directly determines memory retrieval accuracy. The LMEB benchmark (Zhao et al., March 2026) evaluates embedding models on long-horizon memory retrieval across four types that map directly to SynthOrg's MemoryCategory enum:

SynthOrg Category LMEB Category Evaluation Priority
EPISODIC Episodic (69 tasks) High
PROCEDURAL Procedural (67 tasks) High
SEMANTIC Semantic (15 tasks) Medium
SOCIAL Dialogue (42 tasks) Medium
WORKING N/A (in-context) N/A

MTEB scores do not predict memory retrieval quality (Pearson: -0.115, Spearman: -0.130). Embedding model selection must be evaluated on LMEB, not MTEB. See Decision Log and the Embedding Evaluation reference page for the full analysis, model rankings, and deployment tier recommendations.

Key findings:

  • Larger models do not always outperform smaller ones on memory retrieval
  • Dialogue/social memory is the hardest retrieval category for all models
  • Instruction sensitivity varies per model -- must be validated per deployment
  • Three deployment tiers are recommended: full-resource (7-12B), mid-resource (1-4B), and CPU-only (< 1B)

Research Direction: Domain-Specific Embedding Fine-Tuning

Domain-specific fine-tuning can improve retrieval quality by 10-27% over base models (NVIDIA evaluation). The pipeline requires no manual annotation and runs on a single GPU.

Pipeline stages:

  1. Synthetic data generation -- LLM generates query-document pairs from org documents (policies, ADRs, procedures, coding standards)
  2. Hard negative mining -- base model embeds all passages; top-k semantically similar but non-matching passages become hard negatives
  3. Contrastive fine-tuning -- biencoder training with InfoNCE loss (tau=0.02, 3 epochs, lr=1e-5). Single GPU, 1-2 hours for ~500 documents
  4. Deploy -- save checkpoint; update Mem0EmbedderConfig to point to fine-tuned model

Integration design: fine-tuning is an offline pipeline triggered via POST /admin/memory/fine-tune (see MemoryAdminController). The optional EmbeddingFineTuneConfig (disabled by default) stores the checkpoint path. When enabled=True and checkpoint_path is set, backend initialization uses the checkpoint path as the model identifier passed to the Mem0 SDK. The embedding provider must serve the fine-tuned model under this identifier.

class EmbeddingFineTuneConfig(BaseModel):
    model_config = ConfigDict(frozen=True, allow_inf_nan=False)

    enabled: bool = False
    checkpoint_path: NotBlankStr | None = None
    base_model: NotBlankStr | None = None
    training_data_dir: NotBlankStr | None = None

When enabled=True, both checkpoint_path and base_model are required (enforced by model validation). Path traversal (..) and Windows-style paths are rejected to prevent container path escapes.

A future FineTuningPipeline protocol would formalize the four stages:

class FineTuningPipeline(Protocol):
    async def generate_training_data(self, source_dir: str) -> Path: ...
    async def mine_hard_negatives(self, training_data: Path) -> Path: ...
    async def fine_tune(self, training_data: Path, base_model: str) -> Path: ...

See Embedding Evaluation for the full pipeline design and expected improvement metrics.

Consolidation and Retention

Memory consolidation, retention enforcement, and archival are configured via frozen Pydantic models in memory/consolidation/config.py:

Config Purpose
ConsolidationConfig Top-level: max_memories_per_agent limit, nested retention and archival sub-configs
RetentionConfig Company-level per-category RetentionRule tuples (category + retention_days), optional default_retention_days fallback; agents can override via MemoryConfig.retention_overrides
ArchivalConfig Enables/disables archival of consolidated entries to ArchivalStore, nested DualModeConfig
DualModeConfig Density-aware dual-mode archival: threshold, summarization model, anchor/fact limits

Dual-Mode Archival

When ArchivalConfig.dual_mode.enabled is True, consolidation classifies content density before choosing an archival mode. This prevents catastrophic information loss from naively summarizing dense content (code, structured data, identifiers). Based on research: Memex (arXiv:2603.04257) and KV Cache Attention Matching (arXiv:2602.16284).

Density Archival Mode Method
Sparse (conversational, narrative) ABSTRACTIVE LLM-generated summary via AbstractiveSummarizer
Dense (code, structured data, IDs) EXTRACTIVE Verbatim key-fact extraction + start/mid/end anchors via ExtractivePreserver

Classification is heuristic-based (DensityClassifier), using five weighted signals: code patterns, structured data markers, identifier density, numeric density, and line structure. No LLM is needed for classification -- only for abstractive summarization. Groups are classified by majority vote: if most entries in a category group are dense, the group uses extractive mode.

Deterministic restore: When entries are archived, the service builds an archival_index (mapping original_idarchival_id) on ConsolidationResult. Agents can use this index to call ArchivalStore.restore(agent_id, entry_id) directly by ID, bypassing semantic search.

Model Purpose
ArchivalMode Enum: ABSTRACTIVE or EXTRACTIVE
ArchivalModeAssignment Maps a removed entry ID to its archival mode (set by strategy)
ArchivalIndexEntry Maps original entry ID to archival store ID (built by service)

Per-Agent Retention Overrides

Individual agents can override company-level retention rules via MemoryConfig.retention_overrides (per-category) and MemoryConfig.retention_days (agent-level default).

Resolution order per category:

  1. Agent per-category rule
  2. Company per-category rule
  3. Agent global default
  4. Company global default
  5. Keep forever (no expiry)

Operational Data Persistence

Agent memory is handled by the MemoryBackend protocol (Mem0 initial, custom stack future -- see Decision Log). Operational data -- tasks, cost records, messages, audit logs -- is a separate concern managed by a pluggable PersistenceBackend protocol. Application code depends only on repository protocols; the storage engine is an implementation detail swappable via config.

Architecture

+------------------------------------------------------------------+
|                     Application Code                             |
|  engine/  budget/  communication/  security/                     |
|     |        |           |             |                         |
|     v        v           v             v                         |
|  +------+ +------+ +----------+ +----------+                    |
|  | Task | | Cost | | Message  | |  Audit   |  <-- Repository    |
|  | Repo | | Repo | |  Repo    | |  Repo    |      Protocols     |
|  +--+---+ +--+---+ +----+-----+ +----+-----+                    |
|     +--------+----------+------------+                           |
|                      |                                           |
|  +-------------------+-------------------------------------------+
|  |              PersistenceBackend (protocol)                    |
|  |  connect() . disconnect() . health_check() . migrate()       |
|  +-------------------+-------------------------------------------+
|                      |                                           |
|  +-------------------+-------------------------------------------+
|  |  SQLitePersistenceBackend (initial)                           |
|  |  PostgresPersistenceBackend (future)                          |
|  |  MariaDBPersistenceBackend (future)                           |
|  +---------------------------------------------------------------+
+------------------------------------------------------------------+

Protocol Design

@runtime_checkable
class PersistenceBackend(Protocol):
    """Lifecycle management for operational data storage."""

    async def connect(self) -> None: ...
    async def disconnect(self) -> None: ...
    async def health_check(self) -> bool: ...
    async def migrate(self) -> None: ...

    @property
    def is_connected(self) -> bool: ...
    @property
    def backend_name(self) -> NotBlankStr: ...

    @property
    def tasks(self) -> TaskRepository: ...
    @property
    def cost_records(self) -> CostRecordRepository: ...
    @property
    def messages(self) -> MessageRepository: ...
    # ... plus lifecycle_events, task_metrics, collaboration_metrics,
    #     parked_contexts, audit_entries, users, api_keys, checkpoints,
    #     heartbeats, agent_states, settings, artifacts, projects,
    #     custom_presets

Each entity type has its own repository protocol:

@runtime_checkable
class TaskRepository(Protocol):
    """CRUD + query interface for Task persistence."""

    async def save(self, task: Task) -> None: ...
    async def get(self, task_id: str) -> Task | None: ...
    async def list_tasks(self, *, status: TaskStatus | None = None, assigned_to: str | None = None, project: str | None = None) -> tuple[Task, ...]: ...
    async def delete(self, task_id: str) -> bool: ...

@runtime_checkable
class CostRecordRepository(Protocol):
    """CRUD + aggregation interface for CostRecord persistence."""

    async def save(self, record: CostRecord) -> None: ...
    async def query(self, *, agent_id: str | None = None, task_id: str | None = None) -> tuple[CostRecord, ...]: ...
    async def aggregate(self, *, agent_id: str | None = None) -> float: ...

@runtime_checkable
class MessageRepository(Protocol):
    """CRUD + query interface for Message persistence."""

    async def save(self, message: Message) -> None: ...
    async def get_history(self, channel: str, *, limit: int | None = None) -> tuple[Message, ...]: ...

Configuration

persistence:
  backend: "sqlite"                   # sqlite, postgresql, mariadb (future)
  sqlite:
    path: "/data/synthorg.db"       # database file path (mounted volume in Docker)
    wal_mode: true                    # WAL for concurrent read performance
    journal_size_limit: 67108864      # 64 MB WAL journal limit
  # postgresql:                       # future
  #   url: "postgresql://user:pass@host:5432/synthorg"
  #   pool_size: 10
  # mariadb:                          # future
  #   url: "mariadb://user:pass@host:3306/synthorg"
  #   pool_size: 10

Entities Persisted

Entity Source Module Repository Key Queries
Task core/task.py TaskRepository by status, by assignee, by project
CostRecord budget/cost_record.py CostRecordRepository by agent, by task, aggregations
Message communication/message.py MessageRepository by channel
AuditEntry security/models.py AuditRepository by agent, by action type, by verdict, by risk level, time range
ParkedContext security/timeout/parked_context.py ParkedContextRepository by execution_id, by agent_id, by task_id
AgentRuntimeState engine/agent_state.py AgentStateRepository by agent_id, active agents
Setting settings/models.py SettingsRepository by namespace+key, by namespace, all
Artifact core/artifact.py ArtifactRepository by task_id, by created_by, by artifact_type
Project core/project.py ProjectRepository by status, by lead
Custom preset templates/preset_service.py PersonalityPresetRepository by name

Schema Strategy

  • Schema is applied at startup via PersistenceBackend.migrate() which calls apply_schema()
  • The canonical schema lives in src/synthorg/persistence/sqlite/schema.sql (single source of truth)
  • All DDL uses IF NOT EXISTS guards, making application idempotent
  • No sequential migrations exist yet -- when data stability is declared, adopt Atlas for declarative migrations (diff schema.sql against the live DB)

Key Principles

Application code never imports a concrete backend
Only repository protocols are used. This ensures complete decoupling from the storage engine.
Adding a new backend requires no changes to consumers
Implement PersistenceBackend + all repository protocols. Existing application code works unchanged.
Same entity models everywhere
Repositories accept and return the existing frozen Pydantic models (Task, CostRecord, Message). No ORM models or data transfer objects.
Async throughout
All repository methods are async, matching the framework's concurrency model.

Multi-Tenancy

Each company gets its own database. The PersistenceConfig embedded in a company's RootConfig specifies the backend type and connection details (e.g., a unique SQLite file path or PostgreSQL database URL). The create_backend(config) factory returns an isolated PersistenceBackend instance per company -- no shared state, no cross-company data leakage.

# One database per company -- configured in each company's YAML
company_a_backend = create_backend(company_a_config.persistence)
company_b_backend = create_backend(company_b_config.persistence)
# Each backend has independent lifecycle: connect -> migrate -> use -> disconnect

Planned

Runtime backend switching (e.g., migrating a company from SQLite to PostgreSQL during operation) is a planned future capability. The protocol-based design already supports this -- the engine would disconnect the current backend, connect a new one with different config, and migrate. Implementation details (data migration tooling, zero-downtime switchover, connection draining) are deferred to the PostgreSQL backend implementation.


Procedural Memory Auto-Generation

When an agent fails a task, the engine's post-execution pipeline can automatically generate a procedural memory entry -- a structured "next time, do X when encountering Y" lesson learned. This follows the EvoSkill three-agent separation principle: the failed agent does not write its own lesson; a separate proposer LLM call analyses the failure.

Pipeline

  1. Failure analysis payload (FailureAnalysisPayload): Built from RecoveryResult + ExecutionResult. Includes task metadata, sanitized error message, tool calls made, retry count, and turn count. Deliberately excludes raw conversation messages (privacy boundary).

  2. Proposer LLM call (ProceduralMemoryProposer): A separate completion call with its own system prompt analyses the payload and returns a structured ProceduralMemoryProposal.

  3. Three-tier progressive disclosure:

    • Discovery (~100 tokens): concise summary for retrieval ranking.
    • Activation (condition + action + rationale): when/what/why.
    • Execution (ordered steps): concrete steps for applying the knowledge.
  4. Storage: The proposal is stored via MemoryBackend.store() as a MemoryCategory.PROCEDURAL entry with "non-inferable" tag for retrieval filtering.

  5. SKILL.md materialization (optional): When ProceduralMemoryConfig.skill_md_directory is set, the proposal is also written as a portable SKILL.md file following the Agent Skills format for git-native versioning.

Configuration

ProceduralMemoryConfig (nested in CompanyMemoryConfig.procedural) controls:

  • enabled: Toggle auto-generation on/off (default: True).
  • model: Model identifier for the proposer LLM call (default: "example-small-001").
  • temperature: Sampling temperature (default: 0.3).
  • max_tokens: Token budget for the proposer response (default: 1500).
  • min_confidence: Discard proposals below this threshold (default: 0.5).
  • skill_md_directory: Optional path for SKILL.md file materialization.

Integration Point

AgentEngine._try_procedural_memory() runs after error recovery in _post_execution_pipeline. It is non-critical: failures are logged at WARNING and never block the execution result.


Memory Injection Strategies

Agent memory reaches agents through pluggable injection strategies behind the MemoryInjectionStrategy protocol. The strategy determines how memories are surfaced to the agent during execution.

Pre-retrieves relevant memories before execution, ranks by relevance and recency, enforces a token budget, and formats memories as ChatMessage(s) injected between the system prompt and task instruction. The agent passively receives memories.

Pipeline (Linear -- single-source, default):

  1. MemoryBackend.retrieve() -- fetch candidate memories (dense vector search)
  2. Rank by relevance + recency via linear combination
  3. Filter by min_relevance threshold
  4. Apply MemoryFilterStrategy (Decision Log D23, optional) -- exclude inferable content
  5. Greedy token-budget packing
  6. Format as ChatMessage (configured role: SYSTEM or USER) with delimiters

Pipeline (RRF hybrid search -- multi-source):

When fusion_strategy: rrf is configured, the pipeline runs both dense and BM25 sparse search in parallel and fuses results:

  1. Dense search: MemoryBackend.retrieve() for personal, SharedKnowledgeStore.search_shared() for shared (in parallel)
  2. Sparse BM25 search: MemoryBackend.retrieve_sparse() for personal (shared sparse disabled until SharedKnowledgeStore adds the method)
  3. Fuse via fuse_ranked_lists() with configurable rrf_k smoothing constant
  4. Post-RRF min_relevance filter on combined_score
  5. Apply MemoryFilterStrategy (optional)
  6. Greedy token-budget packing
  7. Format as ChatMessage

BM25 sparse vectors are stored alongside dense vectors in Qdrant using a named sparse vector field with Modifier.IDF (Qdrant applies IDF server-side). The BM25Tokenizer uses murmurhash3 for vocabulary-free token-to-index mapping; only term frequencies are stored. Sparse search is opt-in via Mem0BackendConfig.sparse_search_enabled.

Shared memories (from SharedKnowledgeStore) are fetched in parallel, merged with personal memories (no personal_boost for shared), and ranked together.

Ranking Algorithm (Linear -- default):

  1. relevance = entry.relevance_score ?? config.default_relevance
  2. Personal entries: relevance = min(relevance + personal_boost, 1.0)
  3. recency = exp(-decay_rate * age_hours)
  4. combined = relevance_weight * relevance + recency_weight * recency
  5. Filter: combined >= min_relevance
  6. Sort descending by combined_score

Alternative: Reciprocal Rank Fusion (RRF)

When fusion_strategy: rrf is configured, multiple pre-ranked lists (e.g., from different retrieval sources) are merged via RRF: score(doc) = sum(1 / (k + rank_i)) across all lists containing the document. Scores are min-max normalized to [0.0, 1.0]. The smoothing constant k (default 60, configurable via rrf_k) controls rank-difference amplification. RRF is the de facto standard for hybrid search fusion (Qdrant, NeMo Retriever). It is intended for multi-source scenarios (BM25 + vector, multi-round tool-based retrieval); the linear strategy remains the default for single-source retrieval. Results are truncated to max_results (default 20) after scoring and sorting.

Non-Inferable Filter

Retrieved memories are filtered before injection to exclude content the agent can discover by reading the codebase or environment. Only non-inferable information is injected: prior decisions, learned conventions, interpersonal context, historical outcomes. Research shows generic context increases cost 20%+ with minimal success improvement; LLM-generated context can actually reduce success rates.

Filter strategy (Decision Log D23): Pluggable MemoryFilterStrategy protocol. Initial implementation uses tag-based filtering at write time. A non-inferable tag convention with advisory validation at the MemoryBackend.store() boundary warns on missing tags but never blocks. The system prompt instructs agents what qualifies as non-inferable: design rationale, team decisions, "why not X," cross-repo knowledge. Uses existing MemoryMetadata.tags and MemoryQuery.tags -- zero new models needed.

The agent has recall_memory / search_memory tools it calls on-demand during execution. The agent actively decides when and what to remember. More token-efficient (only retrieves when needed) but consumes tool-call turns and requires agent discipline to invoke.

Implemented via ToolBasedInjectionStrategy. The strategy:

  • Injects a brief system instruction about available memory tools
  • Exposes search_memory and recall_memory (by ID) tools
  • Delegates search_memory requests to MemoryBackend.retrieve() (dense-only; hybrid dense+sparse with RRF fusion is not yet wired into the tool-based path)
  • Hybrid retrieval and RRF fusion are handled at the ContextInjectionStrategy level, not within ToolBasedInjectionStrategy
  • QueryReformulator and SufficiencyChecker protocols exist with LLM-based implementations, but iterative reformulation is not yet wired into the tool-based strategy's search handler (reserved via query_reformulation_enabled config field)

ToolRegistry integration: SearchMemoryTool and RecallMemoryTool are BaseTool subclasses (memory/tools.py) that delegate execution to ToolBasedInjectionStrategy.handle_tool_call(). The registry_with_memory_tools() factory augments a ToolRegistry with these tools when the strategy is ToolBasedInjectionStrategy. AgentEngine accepts an optional memory_injection_strategy parameter and wires the tools into each agent's registry at execution time. This ensures memory tools participate in the standard ToolInvoker dispatch pipeline, including permission checking (ToolCategory.MEMORY), security interceptors, and invocation tracking.

MCP bridge evaluation: Both context injection and tool-based strategies hold direct MemoryBackend references and run in-process. The memory hot path already bypasses MCP by design -- no additional optimization needed.

The agent has structured memory blocks (core, archival, recall) it reads AND writes during execution via dedicated tools. Core memory is always in context; archival and recall are searched via tools. Most sophisticated (self-editing memory architecture) but highest complexity and LLM overhead.

MemoryInjectionStrategy Protocol

All strategies implement MemoryInjectionStrategy:

class MemoryInjectionStrategy(Protocol):

    async def prepare_messages(
        self, agent_id: NotBlankStr, query_text: NotBlankStr, token_budget: int
    ) -> tuple[ChatMessage, ...]: ...

    def get_tool_definitions(self) -> tuple[ToolDefinition, ...]: ...

    @property
    def strategy_name(self) -> str: ...

Strategy selection via config: memory.retrieval.strategy: context | tool_based | self_editing