Memory & Persistence¶

The SynthOrg framework separates two distinct storage concerns:

Agent memory -- what agents know, remember, and learn (working, episodic, semantic, procedural, social)
Operational data -- tasks, cost records, messages, and audit logs generated during execution

Both are implemented behind pluggable protocol interfaces, making storage backends swappable via configuration without modifying application code.

Memory Architecture¶

+-------------------------------------------------+
|              Agent Memory System                |
+----------+----------+-----------+---------------+
| Working  | Episodic | Semantic  | Procedural    |
| Memory   | Memory   | Memory    | Memory        |
|          |          |           |               |
| Current  | Past     | Knowledge | Skills &      |
| task     | events & | & facts   | how-to        |
| context  | decisions| learned   |               |
+----------+----------+-----------+---------------+
|            Storage Backend                      |
|   Mem0 (initial, implemented) / Custom (future) |
|   Qdrant (embedded) + SQLite history             |
|     See Decision Log                             |
+-------------------------------------------------+

Each agent maintains its own memory store. The storage backend is selected via configuration and all access flows through the MemoryBackend protocol.

Memory Types¶

Type	Scope	Persistence	Example
Working	Current task	None (in-context)	"I'm implementing the auth endpoint"
Episodic	Past events	Configurable	"Last sprint the team chose JWT over sessions"
Semantic	Knowledge	Long-term	"This project uses Litestar with aiosqlite"
Procedural	Skills/patterns	Long-term	"Code reviews require 2 approvals here"
Social	Relationships	Long-term	"The QA lead prefers detailed test plans"

Memory Levels¶

Memory persistence is configurable per agent, from no persistence to fully persistent storage.

Memory Level Configuration

memory:
  level: "persistent"            # none | session | project | persistent (default: session)
  backend: "mem0"               # mem0 | custom | cognee | graphiti (future)
  storage:
    data_dir: "/data/memory"    # mounted Docker volume path
    vector_store: "qdrant"      # hardcoded to embedded qdrant in Mem0 backend
    history_store: "sqlite"     # hardcoded to sqlite in Mem0 backend
  options:
    retention_days: null         # null = forever
    max_memories_per_agent: 10000
    consolidation_interval: "daily"  # compress old memories
    shared_knowledge_base: true      # agents can access shared facts

Shared Organizational Memory¶

Beyond individual agent memory, the framework provides organizational memory -- company-wide knowledge that all agents can access: policies, conventions, architecture decision records (ADRs), coding standards, and operational procedures. This is not personal episodic memory ("what I did last Tuesday") but institutional knowledge ("the team always uses Litestar, not Flask").

Shared organizational memory is implemented behind an OrgMemoryBackend protocol, making the system highly modular and extensible. New backends can be added without modifying existing ones.

Backend 1: Hybrid Prompt + Retrieval (Default)¶

Critical rules (5--10 items, e.g., "no commits to main," "all PRs need 2 approvals") are injected into every agent's system prompt. Extended knowledge (ADRs, detailed procedures, style guides) is stored in a queryable store and retrieved on demand at task start.

org_memory:
  backend: "hybrid_prompt_retrieval"    # hybrid_prompt_retrieval, graph_rag, temporal_kg
  core_policies:                        # always in system prompt
    - "All code must have 80%+ test coverage"
    - "Use Litestar, not Flask"
    - "PRs require 2 approvals"
  extended_store:
    backend: "sqlite"                   # sqlite, postgresql
    max_retrieved_per_query: 5
  write_access:
    policies: ["human"]                 # only humans write core policies
    adrs: ["human", "senior", "lead", "c_suite"]
    procedures: ["human", "senior", "lead", "c_suite"]

Strengths: Simple to implement. Core rules are always present. Extended knowledge scales with the organization.

Limitations: Basic retrieval may miss relational connections between policies.

Research Directions¶

The following backends illustrate why OrgMemoryBackend is a protocol -- the architecture supports future upgrades without modifying existing code. These are research directions that may inform future work if organizational memory needs outgrow the Hybrid Prompt + Retrieval approach.

Research Direction: GraphRAG Knowledge Graph

Organizational knowledge stored as entities + relationships in a knowledge graph. Agents query via graph traversal, enabling multi-hop reasoning: "Litestar is the standard" is linked to "don't use Flask," which is linked to "exception: data team uses Django for admin."

org_memory:
  backend: "graph_rag"
  graph:
    store: "sqlite"                     # graph stored in relational DB, or dedicated graph DB
    entity_extraction: "auto"           # auto-extract entities from ADRs and policies

Strengths: Significant accuracy improvement over vector-only retrieval (some benchmarks report 3--4x gains). Multi-hop reasoning captures policy relationships.

Limitations: More complex infrastructure. Entity extraction can be noisy. Heavier setup.

Research Direction: Temporal Knowledge Graph

Like GraphRAG but tracks how facts change over time. "The team used Flask until March 2026, then switched to Litestar." Agents see current truth but can query history for context.

org_memory:
  backend: "temporal_kg"
  temporal:
    track_changes: true
    history_retention_days: null        # null = forever

Strengths: Handles policy evolution naturally. Agents understand when and why things changed.

Limitations: Most complex. Potentially overkill for small organizations or local-first use.

OrgMemoryBackend Protocol¶

All backends implement the OrgMemoryBackend protocol:

query(OrgMemoryQuery) -> tuple[OrgFact, ...]
write(OrgFactWriteRequest, *, author: OrgFactAuthor) -> NotBlankStr
list_policies() -> tuple[OrgFact, ...]
Lifecycle methods: connect, disconnect, health_check, is_connected, backend_name

The MVP ships with Backend 1 (Hybrid Prompt + Retrieval). The selected memory layer backend Mem0 (Decision Log) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for the research direction backends.

Write Access Control

Core policies are human-only. ADRs and procedures can be written by senior+ agents. All writes are append-only and auditable. This prevents agents from corrupting shared organizational knowledge while allowing senior agents to document decisions.

Memory Backend Protocol¶

Agent memory is implemented behind a pluggable MemoryBackend protocol (Mem0 initial, custom stack future -- see Decision Log). Application code depends only on the protocol; the storage engine is an implementation detail swappable via config.

Enums¶

Enum	Values	Purpose
`MemoryCategory`	WORKING, EPISODIC, SEMANTIC, PROCEDURAL, SOCIAL	Memory type categories
`MemoryLevel`	PERSISTENT, PROJECT, SESSION, NONE	Persistence level per agent
`ConsolidationInterval`	HOURLY, DAILY, WEEKLY, NEVER	How often old memories are compressed

MemoryBackend Protocol¶

@runtime_checkable
class MemoryBackend(Protocol):
    """Lifecycle + CRUD for agent memory storage."""

    async def connect(self) -> None: ...
    async def disconnect(self) -> None: ...
    async def health_check(self) -> bool: ...

    @property
    def is_connected(self) -> bool: ...
    @property
    def backend_name(self) -> NotBlankStr: ...

    async def store(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def retrieve(self, agent_id: NotBlankStr, query: MemoryQuery) -> tuple[MemoryEntry, ...]:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def get(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> MemoryEntry | None:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def delete(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def count(self, agent_id: NotBlankStr, *, category: MemoryCategory | None = None) -> int:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...

MemoryCapabilities Protocol¶

Backends that implement MemoryCapabilities expose what features they support, enabling runtime capability checks before attempting operations.

@runtime_checkable
class MemoryCapabilities(Protocol):
    """Capability discovery for memory backends."""

    @property
    def supported_categories(self) -> frozenset[MemoryCategory]: ...
    @property
    def supports_graph(self) -> bool: ...
    @property
    def supports_temporal(self) -> bool: ...
    @property
    def supports_vector_search(self) -> bool: ...
    @property
    def supports_shared_access(self) -> bool: ...
    @property
    def max_memories_per_agent(self) -> int | None: ...

SharedKnowledgeStore Protocol¶

Backends that support cross-agent shared knowledge implement this protocol alongside MemoryBackend. Not all backends require cross-agent queries -- this keeps the base protocol clean.

@runtime_checkable
class SharedKnowledgeStore(Protocol):
    """Cross-agent shared knowledge operations."""

    async def publish(self, agent_id: NotBlankStr, request: MemoryStoreRequest) -> NotBlankStr:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...
    async def search_shared(self, query: MemoryQuery, *, exclude_agent: NotBlankStr | None = None) -> tuple[MemoryEntry, ...]:
        """Raises: MemoryConnectionError, MemoryRetrievalError."""
        ...
    async def retract(self, agent_id: NotBlankStr, memory_id: NotBlankStr) -> bool:
        """Raises: MemoryConnectionError, MemoryStoreError."""
        ...

Error Hierarchy¶

All memory errors inherit from MemoryError so callers can catch the entire family with a single except clause.

Error	When Raised
`MemoryError`	Base exception for all memory operations
`MemoryConnectionError`	Backend connection cannot be established or is lost
`MemoryStoreError`	A store or delete operation fails
`MemoryRetrievalError`	A retrieve, search, or count operation fails
`MemoryNotFoundError`	A specific memory ID is not found
`MemoryConfigError`	Memory configuration is invalid
`MemoryCapabilityError`	An unsupported operation is attempted for a backend

Configuration¶

memory:
  backend: "mem0"
  level: "persistent"              # none, session, project, persistent (default: session)
  storage:
    data_dir: "/data/memory"
    vector_store: "qdrant"          # hardcoded to embedded qdrant in Mem0 backend
    history_store: "sqlite"         # hardcoded to sqlite in Mem0 backend
  options:
    retention_days: null            # null = forever
    max_memories_per_agent: 10000
    consolidation_interval: "daily"
    shared_knowledge_base: true

# Embedder config is passed programmatically via the factory:
#   create_memory_backend(config, embedder=Mem0EmbedderConfig(
#       provider="<embedding-provider>",
#       model="<embedding-model-id>",
#       dims=1536,
#   ))

Configuration is modeled by CompanyMemoryConfig (top-level), MemoryStorageConfig (storage paths/backends), and MemoryOptionsConfig (behaviour tuning). All are frozen Pydantic models. The create_memory_backend(config, *, embedder=...) factory returns an isolated MemoryBackend instance per company. The embedder kwarg is required for the Mem0 backend (must be a Mem0EmbedderConfig).

Embedding Model Selection¶

Embedding model quality directly determines memory retrieval accuracy. The LMEB benchmark (Zhao et al., March 2026) evaluates embedding models on long-horizon memory retrieval across four types that map directly to SynthOrg's MemoryCategory enum:

SynthOrg Category	LMEB Category	Evaluation Priority
EPISODIC	Episodic (69 tasks)	High
PROCEDURAL	Procedural (67 tasks)	High
SEMANTIC	Semantic (15 tasks)	Medium
SOCIAL	Dialogue (42 tasks)	Medium
WORKING	N/A (in-context)	N/A

MTEB scores do not predict memory retrieval quality (Pearson: -0.115, Spearman: -0.130). Embedding model selection must be evaluated on LMEB, not MTEB. See Decision Log and the Embedding Evaluation reference page for the full analysis, model rankings, and deployment tier recommendations.

Key findings:

Larger models do not always outperform smaller ones on memory retrieval
Dialogue/social memory is the hardest retrieval category for all models
Instruction sensitivity varies per model -- must be validated per deployment
Three deployment tiers are recommended: full-resource (7-12B), mid-resource (1-4B), and CPU-only (< 1B)

Research Direction: Domain-Specific Embedding Fine-Tuning

Domain-specific fine-tuning can improve retrieval quality by 10-27% over base models (NVIDIA evaluation). The pipeline requires no manual annotation and runs on a single GPU.

Pipeline stages:

Synthetic data generation -- LLM generates query-document pairs from org documents (policies, ADRs, procedures, coding standards)
Hard negative mining -- base model embeds all passages; top-k semantically similar but non-matching passages become hard negatives
Contrastive fine-tuning -- biencoder training with InfoNCE loss (tau=0.02, 3 epochs, lr=1e-5). Single GPU, 1-2 hours for ~500 documents
Deploy -- save checkpoint; update Mem0EmbedderConfig to point to fine-tuned model

Integration design: fine-tuning is an offline pipeline triggered via POST /admin/memory/fine-tune (see MemoryAdminController). The optional EmbeddingFineTuneConfig (disabled by default) stores the checkpoint path. When enabled=True and checkpoint_path is set, backend initialization uses the checkpoint path as the model identifier passed to the Mem0 SDK. The embedding provider must serve the fine-tuned model under this identifier.

class EmbeddingFineTuneConfig(BaseModel):
    model_config = ConfigDict(frozen=True, allow_inf_nan=False)

    enabled: bool = False
    checkpoint_path: NotBlankStr | None = None
    base_model: NotBlankStr | None = None
    training_data_dir: NotBlankStr | None = None

When enabled=True, both checkpoint_path and base_model are required (enforced by model validation). Path traversal (..) and Windows-style paths are rejected to prevent container path escapes.

A future FineTuningPipeline protocol would formalize the four stages:

class FineTuningPipeline(Protocol):
    async def generate_training_data(self, source_dir: str) -> Path: ...
    async def mine_hard_negatives(self, training_data: Path) -> Path: ...
    async def fine_tune(self, training_data: Path, base_model: str) -> Path: ...

See Embedding Evaluation for the full pipeline design and expected improvement metrics.

Consolidation and Retention¶

Memory consolidation, retention enforcement, and archival are configured via frozen Pydantic models in memory/consolidation/config.py:

Config	Purpose
`ConsolidationConfig`	Top-level: `max_memories_per_agent` limit, nested `retention` and `archival` sub-configs
`RetentionConfig`	Company-level per-category `RetentionRule` tuples (category + retention_days), optional `default_retention_days` fallback; agents can override via `MemoryConfig.retention_overrides`
`ArchivalConfig`	Enables/disables archival of consolidated entries to `ArchivalStore`, nested `DualModeConfig`
`DualModeConfig`	Density-aware dual-mode archival: threshold, summarization model, anchor/fact limits

Dual-Mode Archival¶

When ArchivalConfig.dual_mode.enabled is True, consolidation classifies content density before choosing an archival mode. This prevents catastrophic information loss from naively summarizing dense content (code, structured data, identifiers). Based on research: Memex (arXiv:2603.04257) and KV Cache Attention Matching (arXiv:2602.16284).

Density	Archival Mode	Method
Sparse (conversational, narrative)	`ABSTRACTIVE`	LLM-generated summary via `AbstractiveSummarizer`
Dense (code, structured data, IDs)	`EXTRACTIVE`	Verbatim key-fact extraction + start/mid/end anchors via `ExtractivePreserver`

Classification is heuristic-based (DensityClassifier), using five weighted signals: code patterns, structured data markers, identifier density, numeric density, and line structure. No LLM is needed for classification -- only for abstractive summarization. Groups are classified by majority vote: if most entries in a category group are dense, the group uses extractive mode.

Deterministic restore: When entries are archived, the service builds an archival_index (mapping original_id → archival_id) on ConsolidationResult. Agents can use this index to call ArchivalStore.restore(agent_id, entry_id) directly by ID, bypassing semantic search.

Model	Purpose
`ArchivalMode`	Enum: `ABSTRACTIVE` or `EXTRACTIVE`
`ArchivalModeAssignment`	Maps a removed entry ID to its archival mode (set by strategy)
`ArchivalIndexEntry`	Maps original entry ID to archival store ID (built by service)

Per-Agent Retention Overrides¶

Individual agents can override company-level retention rules via MemoryConfig.retention_overrides (per-category) and MemoryConfig.retention_days (agent-level default).

Resolution order per category:

Agent per-category rule
Company per-category rule
Agent global default
Company global default
Keep forever (no expiry)

Operational Data Persistence¶

Agent memory is handled by the MemoryBackend protocol (Mem0 initial, custom stack future -- see Decision Log). Operational data -- tasks, cost records, messages, audit logs -- is a separate concern managed by a pluggable PersistenceBackend protocol. Application code depends only on repository protocols; the storage engine is an implementation detail swappable via config.

Architecture¶

+------------------------------------------------------------------+
|                     Application Code                             |
|  engine/  budget/  communication/  security/                     |
|     |        |           |             |                         |
|     v        v           v             v                         |
|  +------+ +------+ +----------+ +----------+                    |
|  | Task | | Cost | | Message  | |  Audit   |  <-- Repository    |
|  | Repo | | Repo | |  Repo    | |  Repo    |      Protocols     |
|  +--+---+ +--+---+ +----+-----+ +----+-----+                    |
|     +--------+----------+------------+                           |
|                      |                                           |
|  +-------------------+-------------------------------------------+
|  |              PersistenceBackend (protocol)                    |
|  |  connect() . disconnect() . health_check() . migrate()       |
|  +-------------------+-------------------------------------------+
|                      |                                           |
|  +-------------------+-------------------------------------------+
|  |  SQLitePersistenceBackend (initial)                           |
|  |  PostgresPersistenceBackend (future)                          |
|  |  MariaDBPersistenceBackend (future)                           |
|  +---------------------------------------------------------------+
+------------------------------------------------------------------+

Protocol Design¶

@runtime_checkable
class PersistenceBackend(Protocol):
    """Lifecycle management for operational data storage."""

    async def connect(self) -> None: ...
    async def disconnect(self) -> None: ...
    async def health_check(self) -> bool: ...
    async def migrate(self) -> None: ...

    @property
    def is_connected(self) -> bool: ...
    @property
    def backend_name(self) -> NotBlankStr: ...

    @property
    def tasks(self) -> TaskRepository: ...
    @property
    def cost_records(self) -> CostRecordRepository: ...
    @property
    def messages(self) -> MessageRepository: ...
    # ... plus lifecycle_events, task_metrics, collaboration_metrics,
    #     parked_contexts, audit_entries, users, api_keys, checkpoints,
    #     heartbeats, agent_states, settings, artifacts, projects,
    #     custom_presets

Each entity type has its own repository protocol:

@runtime_checkable
class TaskRepository(Protocol):
    """CRUD + query interface for Task persistence."""

    async def save(self, task: Task) -> None: ...
    async def get(self, task_id: str) -> Task | None: ...
    async def list_tasks(self, *, status: TaskStatus | None = None, assigned_to: str | None = None, project: str | None = None) -> tuple[Task, ...]: ...
    async def delete(self, task_id: str) -> bool: ...

@runtime_checkable
class CostRecordRepository(Protocol):
    """CRUD + aggregation interface for CostRecord persistence."""

    async def save(self, record: CostRecord) -> None: ...
    async def query(self, *, agent_id: str | None = None, task_id: str | None = None) -> tuple[CostRecord, ...]: ...
    async def aggregate(self, *, agent_id: str | None = None) -> float: ...

@runtime_checkable
class MessageRepository(Protocol):
    """CRUD + query interface for Message persistence."""

    async def save(self, message: Message) -> None: ...
    async def get_history(self, channel: str, *, limit: int | None = None) -> tuple[Message, ...]: ...

Configuration¶

persistence:
  backend: "sqlite"                   # sqlite, postgresql, mariadb (future)
  sqlite:
    path: "/data/synthorg.db"       # database file path (mounted volume in Docker)
    wal_mode: true                    # WAL for concurrent read performance
    journal_size_limit: 67108864      # 64 MB WAL journal limit
  # postgresql:                       # future
  #   url: "postgresql://user:pass@host:5432/synthorg"
  #   pool_size: 10
  # mariadb:                          # future
  #   url: "mariadb://user:pass@host:3306/synthorg"
  #   pool_size: 10

Entities Persisted¶

Entity	Source Module	Repository	Key Queries
`Task`	`core/task.py`	`TaskRepository`	by status, by assignee, by project
`CostRecord`	`budget/cost_record.py`	`CostRecordRepository`	by agent, by task, aggregations
`Message`	`communication/message.py`	`MessageRepository`	by channel
`AuditEntry`	`security/models.py`	`AuditRepository`	by agent, by action type, by verdict, by risk level, time range
`ParkedContext`	`security/timeout/parked_context.py`	`ParkedContextRepository`	by execution_id, by agent_id, by task_id
`AgentRuntimeState`	`engine/agent_state.py`	`AgentStateRepository`	by agent_id, active agents
Setting	`settings/models.py`	`SettingsRepository`	by namespace+key, by namespace, all
`Artifact`	`core/artifact.py`	`ArtifactRepository`	by task_id, by created_by, by artifact_type
`Project`	`core/project.py`	`ProjectRepository`	by status, by lead
Custom preset	`templates/preset_service.py`	`PersonalityPresetRepository`	by name

Schema Strategy¶

Schema is applied at startup via PersistenceBackend.migrate() which calls apply_schema()
The canonical schema lives in src/synthorg/persistence/sqlite/schema.sql (single source of truth)
All DDL uses IF NOT EXISTS guards, making application idempotent
No sequential migrations exist yet -- when data stability is declared, adopt Atlas for declarative migrations (diff schema.sql against the live DB)

Key Principles¶

Application code never imports a concrete backend: Only repository protocols are used. This ensures complete decoupling from the storage engine.
Adding a new backend requires no changes to consumers: Implement PersistenceBackend + all repository protocols. Existing application code works unchanged.
Same entity models everywhere: Repositories accept and return the existing frozen Pydantic models (Task, CostRecord, Message). No ORM models or data transfer objects.
Async throughout: All repository methods are async, matching the framework's concurrency model.

Multi-Tenancy¶

Each company gets its own database. The PersistenceConfig embedded in a company's RootConfig specifies the backend type and connection details (e.g., a unique SQLite file path or PostgreSQL database URL). The create_backend(config) factory returns an isolated PersistenceBackend instance per company -- no shared state, no cross-company data leakage.

# One database per company -- configured in each company's YAML
company_a_backend = create_backend(company_a_config.persistence)
company_b_backend = create_backend(company_b_config.persistence)
# Each backend has independent lifecycle: connect -> migrate -> use -> disconnect

Planned

Runtime backend switching (e.g., migrating a company from SQLite to PostgreSQL during operation) is a planned future capability. The protocol-based design already supports this -- the engine would disconnect the current backend, connect a new one with different config, and migrate. Implementation details (data migration tooling, zero-downtime switchover, connection draining) are deferred to the PostgreSQL backend implementation.

Procedural Memory Auto-Generation¶

When an agent fails a task, the engine's post-execution pipeline can automatically generate a procedural memory entry -- a structured "next time, do X when encountering Y" lesson learned. This follows the EvoSkill three-agent separation principle: the failed agent does not write its own lesson; a separate proposer LLM call analyses the failure.

Pipeline¶

Failure analysis payload (FailureAnalysisPayload): Built from RecoveryResult + ExecutionResult. Includes task metadata, sanitized error message, tool calls made, retry count, and turn count. Deliberately excludes raw conversation messages (privacy boundary).
Proposer LLM call (ProceduralMemoryProposer): A separate completion call with its own system prompt analyses the payload and returns a structured ProceduralMemoryProposal.
Three-tier progressive disclosure:
- Discovery (~100 tokens): concise summary for retrieval ranking.
- Activation (condition + action + rationale): when/what/why.
- Execution (ordered steps): concrete steps for applying the knowledge.
Storage: The proposal is stored via MemoryBackend.store() as a MemoryCategory.PROCEDURAL entry with "non-inferable" tag for retrieval filtering.
SKILL.md materialization (optional): When ProceduralMemoryConfig.skill_md_directory is set, the proposal is also written as a portable SKILL.md file following the Agent Skills format for git-native versioning.

Configuration¶

ProceduralMemoryConfig (nested in CompanyMemoryConfig.procedural) controls:

enabled: Toggle auto-generation on/off (default: True).
model: Model identifier for the proposer LLM call (default: "example-small-001").
temperature: Sampling temperature (default: 0.3).
max_tokens: Token budget for the proposer response (default: 1500).
min_confidence: Discard proposals below this threshold (default: 0.5).
skill_md_directory: Optional path for SKILL.md file materialization.

Integration Point¶

AgentEngine._try_procedural_memory() runs after error recovery in _post_execution_pipeline. It is non-critical: failures are logged at WARNING and never block the execution result.

Memory Injection Strategies¶

Agent memory reaches agents through pluggable injection strategies behind the MemoryInjectionStrategy protocol. The strategy determines how memories are surfaced to the agent during execution.

Context Injection (Default)Tool-Based RetrievalSelf-Editing Memory (Future)

Pre-retrieves relevant memories before execution, ranks by relevance and recency, enforces a token budget, and formats memories as ChatMessage(s) injected between the system prompt and task instruction. The agent passively receives memories.

Pipeline (Linear -- single-source, default):

MemoryBackend.retrieve() -- fetch candidate memories (dense vector search)
Rank by relevance + recency via linear combination
Filter by min_relevance threshold
Apply MemoryFilterStrategy (Decision Log D23, optional) -- exclude inferable content
Greedy token-budget packing
Format as ChatMessage (configured role: SYSTEM or USER) with delimiters

Pipeline (RRF hybrid search -- multi-source):

When fusion_strategy: rrf is configured, the pipeline runs both dense and BM25 sparse search in parallel and fuses results:

Dense search: MemoryBackend.retrieve() for personal, SharedKnowledgeStore.search_shared() for shared (in parallel)
Sparse BM25 search: MemoryBackend.retrieve_sparse() for personal (shared sparse disabled until SharedKnowledgeStore adds the method)
Fuse via fuse_ranked_lists() with configurable rrf_k smoothing constant
Post-RRF min_relevance filter on combined_score
Apply MemoryFilterStrategy (optional)
Greedy token-budget packing
Format as ChatMessage

BM25 sparse vectors are stored alongside dense vectors in Qdrant using a named sparse vector field with Modifier.IDF (Qdrant applies IDF server-side). The BM25Tokenizer uses murmurhash3 for vocabulary-free token-to-index mapping; only term frequencies are stored. Sparse search is opt-in via Mem0BackendConfig.sparse_search_enabled.

Shared memories (from SharedKnowledgeStore) are fetched in parallel, merged with personal memories (no personal_boost for shared), and ranked together.

Ranking Algorithm (Linear -- default):

relevance = entry.relevance_score ?? config.default_relevance
Personal entries: relevance = min(relevance + personal_boost, 1.0)
recency = exp(-decay_rate * age_hours)
combined = relevance_weight * relevance + recency_weight * recency
Filter: combined >= min_relevance
Sort descending by combined_score

Alternative: Reciprocal Rank Fusion (RRF)

When fusion_strategy: rrf is configured, multiple pre-ranked lists (e.g., from different retrieval sources) are merged via RRF: score(doc) = sum(1 / (k + rank_i)) across all lists containing the document. Scores are min-max normalized to [0.0, 1.0]. The smoothing constant k (default 60, configurable via rrf_k) controls rank-difference amplification. RRF is the de facto standard for hybrid search fusion (Qdrant, NeMo Retriever). It is intended for multi-source scenarios (BM25 + vector, multi-round tool-based retrieval); the linear strategy remains the default for single-source retrieval. Results are truncated to max_results (default 20) after scoring and sorting.

Non-Inferable Filter

Retrieved memories are filtered before injection to exclude content the agent can discover by reading the codebase or environment. Only non-inferable information is injected: prior decisions, learned conventions, interpersonal context, historical outcomes. Research shows generic context increases cost 20%+ with minimal success improvement; LLM-generated context can actually reduce success rates.

Filter strategy (Decision Log D23): Pluggable MemoryFilterStrategy protocol. Initial implementation uses tag-based filtering at write time. A non-inferable tag convention with advisory validation at the MemoryBackend.store() boundary warns on missing tags but never blocks. The system prompt instructs agents what qualifies as non-inferable: design rationale, team decisions, "why not X," cross-repo knowledge. Uses existing MemoryMetadata.tags and MemoryQuery.tags -- zero new models needed.

The agent has recall_memory / search_memory tools it calls on-demand during execution. The agent actively decides when and what to remember. More token-efficient (only retrieves when needed) but consumes tool-call turns and requires agent discipline to invoke.

Implemented via ToolBasedInjectionStrategy. The strategy:

Injects a brief system instruction about available memory tools
Exposes search_memory and recall_memory (by ID) tools
Delegates search_memory requests to MemoryBackend.retrieve() (dense-only; hybrid dense+sparse with RRF fusion is not yet wired into the tool-based path)
Hybrid retrieval and RRF fusion are handled at the ContextInjectionStrategy level, not within ToolBasedInjectionStrategy
QueryReformulator and SufficiencyChecker protocols exist with LLM-based implementations, but iterative reformulation is not yet wired into the tool-based strategy's search handler (reserved via query_reformulation_enabled config field)

ToolRegistry integration: SearchMemoryTool and RecallMemoryTool are BaseTool subclasses (memory/tools.py) that delegate execution to ToolBasedInjectionStrategy.handle_tool_call(). The registry_with_memory_tools() factory augments a ToolRegistry with these tools when the strategy is ToolBasedInjectionStrategy. AgentEngine accepts an optional memory_injection_strategy parameter and wires the tools into each agent's registry at execution time. This ensures memory tools participate in the standard ToolInvoker dispatch pipeline, including permission checking (ToolCategory.MEMORY), security interceptors, and invocation tracking.

MCP bridge evaluation: Both context injection and tool-based strategies hold direct MemoryBackend references and run in-process. The memory hot path already bypasses MCP by design -- no additional optimization needed.

The agent has structured memory blocks (core, archival, recall) it reads AND writes during execution via dedicated tools. Core memory is always in context; archival and recall are searched via tools. Most sophisticated (self-editing memory architecture) but highest complexity and LLM overhead.

MemoryInjectionStrategy Protocol¶

All strategies implement MemoryInjectionStrategy:

class MemoryInjectionStrategy(Protocol):

    async def prepare_messages(
        self, agent_id: NotBlankStr, query_text: NotBlankStr, token_budget: int
    ) -> tuple[ChatMessage, ...]: ...

    def get_tool_definitions(self) -> tuple[ToolDefinition, ...]: ...

    @property
    def strategy_name(self) -> str: ...

Strategy selection via config: memory.retrieval.strategy: context | tool_based | self_editing