Memory Learning and Injection¶
How memory enters and leaves the agent execution loop: procedural memory is
auto-generated from failed and successful executions, surfaced through one
of three injection strategies, and managed through the MemoryService
single entry point for REST and MCP callers.
See also: Memory and Persistence (storage + retrieval pipeline), Operational Data Persistence, Shared Organizational Memory.
Procedural Memory Auto-Generation¶
When an agent fails a task, the engine's post-execution pipeline can automatically generate a procedural memory entry: a structured "next time, do X when encountering Y" lesson learned. This follows the EvoSkill three-agent separation principle: the failed agent does not write its own lesson; a separate proposer LLM call analyses the failure.
Pipeline¶
-
Failure analysis payload (
FailureAnalysisPayload): Built fromRecoveryResult+ExecutionResult. Includes task metadata, sanitized error message, tool calls made, retry count, and turn count. Deliberately excludes raw conversation messages (privacy boundary). -
Proposer LLM call (
ProceduralMemoryProposer): A separate completion call with its own system prompt analyses the payload and returns a structuredProceduralMemoryProposal. -
Three-tier progressive disclosure:
- Discovery (~100 tokens): concise summary for retrieval ranking.
- Activation (condition + action + rationale): when/what/why.
- Execution (ordered steps): concrete steps for applying the knowledge.
-
Storage: The proposal is stored via
MemoryBackend.store()as aMemoryCategory.PROCEDURALentry with"non-inferable"tag for retrieval filtering. -
SKILL.md materialization (optional): When
ProceduralMemoryConfig.skill_md_directoryis set, the proposal is also written as a portable SKILL.md file following the Agent Skills format for git-native versioning.
Configuration¶
ProceduralMemoryConfig (nested in CompanyMemoryConfig.procedural) controls:
enabled: Toggle auto-generation on/off (default:True).model: Model identifier for the proposer LLM call (default:"example-small-001").temperature: Sampling temperature (default:0.3).max_tokens: Token budget for the proposer response (default:1500).min_confidence: Discard proposals below this threshold (default:0.5).skill_md_directory: Optional path for SKILL.md file materialization.
Integration Point¶
AgentEngine._try_procedural_memory() runs after error recovery in
_post_execution_pipeline. It is non-critical: failures are logged at WARNING
and never block the execution result.
Capture Strategies¶
The capture system is extended beyond failure-only via pluggable CaptureStrategy
implementations in memory/procedural/capture/:
| Strategy | When it fires | Output |
|---|---|---|
FailureCaptureStrategy |
recovery_result is not None |
Wraps existing proposer pipeline |
SuccessCaptureStrategy |
Successful completion with quality above threshold | "success-derived" tagged memory |
HybridCaptureStrategy |
Both failure and success paths | Delegates based on outcome |
SuccessMemoryProposer (memory/procedural/success_proposer.py) provides a lighter
LLM analysis for successful executions, focusing on reusable strategies rather than
failure lessons.
Configuration via CaptureConfig: type discriminator ("failure"/"success"/
"hybrid"), min_quality_score (default 8.0), success_quality_percentile (default
75.0).
Pruning Strategies¶
Procedural memory pruning is handled by pluggable PruningStrategy implementations
in memory/procedural/pruning/:
| Strategy | Method |
|---|---|
TtlPruningStrategy |
Remove entries older than max_age_days (default 90) |
ParetoPruningStrategy |
Multi-dimensional Pareto frontier (relevance + recency) down to max_entries |
HybridPruningStrategy |
TTL first (remove expired), then Pareto on remaining |
Cross-Agent Propagation¶
Procedural memories can be propagated across agents via pluggable PropagationStrategy
implementations in memory/procedural/propagation/:
| Strategy | Scope | Tag |
|---|---|---|
NoPropagation |
Agent-local only (safe default) | - |
RoleScopedPropagation |
Agents with same role | "propagated:{source_agent_id}" |
DepartmentScopedPropagation |
Agents in same department | "propagated:{source_agent_id}" |
All propagation strategies respect max_propagation_targets (default 10) and exclude
the source agent.
Cross-Agent Skill Pool¶
Organization-wide shared skills extend procedural memory with an ORG scope.
ProceduralMemoryScope enum: AGENT (per-agent private), ROLE,
DEPARTMENT, ORG (organization-wide shared pool).
Extended ProceduralMemoryProposal adds fields for org-scope lifecycle:
scope: ProceduralMemoryScope: distribution scopesupersedes: tuple[NotBlankStr, ...]: IDs of entries this supersedessuperseded_by: NotBlankStr | None: tombstone marker (filtered from retrieval)application_count: int: how many times appliedlast_applied_at: AwareDatetime | None: last application timestamp
AutonomousSkillEvolver runs on the consolidation schedule:
- Collects trajectories across all agents in a window via
TrajectoryAggregator - Groups by error category or tool call sequence
- Filters patterns seen by >=
min_agents_for_patterndistinct agents - Builds org-scope proposals with confidence proportional to failure rate
- Checks supersession against existing org entries (FULL/PARTIAL/CONFLICT)
- Emits proposals as
ApprovalItementries for human review
Proposal-only, structurally enforced: EvolverConfig.requires_human_approval
is Literal[True] and cannot be set to False. The evolver has no write access
to org memory. Proposals land in the existing ApprovalItem queue.
Supersession rules (checked before proposal emission):
| Verdict | Condition | Action |
|---|---|---|
| CONFLICT | High condition overlap + low action similarity | Skipped, escalated to human |
| FULL | Condition superset + compatible action + higher confidence | Supersedes existing (post-approval) |
| PARTIAL | Everything else | Both coexist |
CONFLICT is checked before FULL to prevent contradictory actions from being accepted as supersessions.
EvolverConfig safety rails: enabled (default False, opt-in),
min_confidence_for_org_promotion (0.8), min_agents_seen_pattern (3),
max_proposals_per_cycle (10), max_org_entries (10000, reserved for
future pruning).
Observability: SKILL_EVOLVER_CYCLE_START, SKILL_EVOLVER_CYCLE_COMPLETE,
SKILL_EVOLVER_CYCLE_FAILED, SKILL_EVOLVER_PROPOSAL_EMITTED,
SKILL_EVOLVER_CONFLICT_DETECTED, ORG_SKILL_SUPERSEDED, SKILL_EVOLVER_DISABLED.
EvolverReport is consumed by R3 #1265 eval loop.
Memory Injection Strategies¶
Agent memory reaches agents through pluggable injection strategies behind the
MemoryInjectionStrategy protocol. The strategy determines how memories are surfaced to
the agent during execution.
Pre-retrieves relevant memories before execution, ranks by relevance and recency, enforces
a token budget, and formats memories as ChatMessage(s) injected between the system prompt
and task instruction. The agent passively receives memories.
Pipeline (Linear, single-source, default):
MemoryBackend.retrieve(): fetch candidate memories (dense vector search)- Rank by relevance + recency via linear combination
- Filter by
min_relevancethreshold - Apply
MemoryFilterStrategy(Decision Log D23, optional): exclude inferable content (fails closed on filter exceptions: returns empty to avoid bypassing privacy filters) - Optional MMR diversity re-ranking when
diversity_penalty_enabled: true, balancing relevance vs redundancy via Maximal Marginal Relevance with word-bigram Jaccard similarity (see Diversity Re-ranking below). Filtering runs first so excluded entries do not act as MMR anchors and suppress diverse-but-visible candidates. - Greedy token-budget packing
- Format as
ChatMessage(configured role: SYSTEM or USER) with delimiters
Pipeline (RRF hybrid search, multi-source):
When fusion_strategy: rrf is configured, the pipeline runs both dense and BM25 sparse
search in parallel and fuses results:
- Dense search:
MemoryBackend.retrieve()for personal,SharedKnowledgeStore.search_shared()for shared (in parallel) - Sparse BM25 search:
MemoryBackend.retrieve_sparse()for personal (shared sparse disabled untilSharedKnowledgeStoreadds the method) - Fuse via
fuse_ranked_lists()with configurablerrf_ksmoothing constant - Post-RRF
min_relevancefilter oncombined_score - Apply
MemoryFilterStrategy(optional, fails closed) - Optional MMR diversity re-ranking when
diversity_penalty_enabled: true - Greedy token-budget packing
- Format as
ChatMessage
BM25 sparse vectors are stored alongside dense vectors in Qdrant using a named sparse
vector field with Modifier.IDF (Qdrant applies IDF server-side). The BM25Tokenizer
uses murmurhash3 for vocabulary-free token-to-index mapping; only term frequencies are
stored. Sparse search is opt-in via Mem0BackendConfig.sparse_search_enabled.
Shared memories (from SharedKnowledgeStore) are fetched in parallel, merged with personal
memories (no personal_boost for shared), and ranked together.
Ranking Algorithm (Linear, default):
relevance = entry.relevance_score ?? config.default_relevance- Personal entries:
relevance = min(relevance + personal_boost, 1.0) recency = exp(-decay_rate * age_hours)combined = relevance_weight * relevance + recency_weight * recency- Filter:
combined >= min_relevance - Sort descending by
combined_score
Alternative: Reciprocal Rank Fusion (RRF)
When fusion_strategy: rrf is configured, multiple pre-ranked lists (e.g., from different
retrieval sources) are merged via RRF: score(doc) = sum(1 / (k + rank_i)) across all
lists containing the document. Scores are min-max normalized to [0.0, 1.0]. The smoothing
constant k (default 60, configurable via rrf_k) controls rank-difference amplification.
RRF is the de facto standard for hybrid search fusion
(Qdrant,
NeMo Retriever). It is
intended for multi-source scenarios (BM25 + vector, multi-round tool-based retrieval); the
linear strategy remains the default for single-source retrieval. Results are truncated to
max_results (default 20) after scoring and sorting.
Diversity Re-ranking (MMR)
When diversity_penalty_enabled: true is set on the config, the
ContextInjectionStrategy pipeline runs apply_diversity_penalty() after
filtering and before token-budget packing. Running the filter first ensures
that privacy-excluded entries are not used as MMR anchors (which could
otherwise suppress visible candidates that happen to be textually similar to
excluded ones). The re-ranker uses Maximal Marginal Relevance:
MMR(candidate) = lambda * combined_score - (1 - lambda) * max_sim_to_selected
where diversity_lambda (default 0.7, range [0.0, 1.0]) controls the
trade-off: 1.0 = pure relevance (no diversity penalty), 0.0 = maximum
diversity. The default similarity function is word-bigram Jaccard; callers
can inject a custom similarity_fn (e.g., cosine on embeddings) for
domain-specific redundancy measures. Bigram sets are pre-computed once per
entry to keep complexity at O(n**2) rather than O(n**2 * k). When
diversity is enabled, the backend over-fetches by a configurable
candidate_pool_multiplier (default 3x, range 1--10) so MMR can promote
diverse candidates that would otherwise fall below the top-K cutoff. This
feature applies only to ContextInjectionStrategy; a model_validator
warns when diversity_penalty_enabled=True is combined with a strategy
that ignores it (e.g. TOOL_BASED).
Non-Inferable Filter
Retrieved memories are filtered before injection to exclude content the agent can discover by reading the codebase or environment. Only non-inferable information is injected: prior decisions, learned conventions, interpersonal context, historical outcomes. Research shows generic context increases cost 20%+ with minimal success improvement; LLM-generated context can actually reduce success rates.
Filter strategy (Decision Log D23): Pluggable MemoryFilterStrategy protocol. Initial
implementation uses tag-based filtering at write time. A non-inferable tag convention
with advisory validation at the MemoryBackend.store() boundary warns on missing tags
but never blocks. The system prompt instructs agents what qualifies as non-inferable:
design rationale, team decisions, "why not X," cross-repo knowledge. Uses existing
MemoryMetadata.tags and MemoryQuery.tags; zero new models needed.
The agent has recall_memory / search_memory tools it calls on-demand during execution.
The agent actively decides when and what to remember. More token-efficient (only retrieves
when needed) but consumes tool-call turns and requires agent discipline to invoke.
Implemented via ToolBasedInjectionStrategy. The strategy:
- Injects a brief system instruction about available memory tools
- Exposes
search_memoryandrecall_memory(by ID) tools - Delegates
search_memoryrequests toMemoryBackend.retrieve()(dense-only; hybrid dense+sparse with RRF fusion is not yet wired into the tool-based path) - Hybrid retrieval and RRF fusion are handled at the
ContextInjectionStrategylevel, not withinToolBasedInjectionStrategy - When
query_reformulation_enabled: trueis set on the config and both aQueryReformulatorand aSufficiencyCheckerare provided at construction,search_memoryruns an iterative Search-and-Ask loop: retrieve -> check sufficiency -> reformulate query -> re-retrieve, up tomax_reformulation_roundsrounds (default 2, max 5). Results from all rounds are merged by entry ID, keeping the highest-relevance version of any duplicate. Sufficiency checker and reformulator failures degrade gracefully to the current cumulative entries rather than propagating. Diversity (MMR) re-ranking is applied only in theContextInjectionStrategypipeline, not in the tool-based handler.
ToolRegistry integration: SearchMemoryTool and RecallMemoryTool are BaseTool
subclasses (defined in the memory/tools/ package) that delegate execution to
ToolBasedInjectionStrategy.handle_tool_call(). The registry_with_memory_tools()
factory augments a ToolRegistry with these tools when the strategy is
ToolBasedInjectionStrategy. AgentEngine accepts an optional
memory_injection_strategy parameter and wires the tools into each agent's registry
at execution time. This ensures memory tools participate in the standard ToolInvoker
dispatch pipeline, including permission checking (ToolCategory.MEMORY), security
interceptors, and invocation tracking.
MCP bridge evaluation: Both context injection and tool-based strategies hold direct
MemoryBackend references and run in-process. The memory hot path bypasses MCP by design;
no additional optimization needed.
The agent has three structured memory blocks (core, archival, and recall) it reads AND
writes during execution via dedicated tools. Core memory (SEMANTIC category, tagged "core")
is always injected into the system prompt. Archival and recall memories are tool-searched on
demand. Six tools are provided: core_memory_read, core_memory_write,
archival_memory_search, archival_memory_write, recall_memory_read,
recall_memory_write.
Implemented via SelfEditingMemoryStrategy. Token overhead is ~250--650 tokens per session
(2--10 writes + 5--15 searches). Best suited for long-running, high-autonomy agents (>20 turns)
where explicit memory management reduces "forgotten context" errors. SelfEditingMemoryConfig
controls core token budget, archival search limit, per-category write access, and a safety
valve (allow_core_writes: bool) for restricting core memory edits on locked-down agents.
MemoryInjectionStrategy Protocol¶
All strategies implement MemoryInjectionStrategy:
class MemoryInjectionStrategy(Protocol):
async def prepare_messages(
self, agent_id: NotBlankStr, query_text: NotBlankStr, token_budget: int
) -> tuple[ChatMessage, ...]: ...
def get_tool_definitions(self) -> tuple[ToolDefinition, ...]: ...
@property
def strategy_name(self) -> str: ...
Strategy selection via config: memory.retrieval.strategy: context | tool_based | self_editing
Memory Service Layer¶
MemoryService (at src/synthorg/memory/service.py) is the single entry point for /memory/fine-tune/* REST endpoints and the MCP memory tools. Controllers and handlers never reach into app_state.persistence.* directly; the service owns the repository handle, audit logging, and typed error routing.
Fine-tune lifecycle¶
MemoryService exposes the full fine-tune lifecycle as typed async methods:
start_fine_tune(plan: FineTunePlan) -> FineTuneRun: starts a new pipeline from aFineTunePlan.resume_fine_tune(run_id: NotBlankStr) -> FineTuneRun: resume a previously failed or cancelled run.get_fine_tune_status(run_id: NotBlankStr | None = None) -> FineTuneStatus: snapshot of the active (or a specific) run.cancel_fine_tune() -> str | None: cancel the active run (destructive). Returns the cancelled run id (captured before cancel so the audit log can attribute it) orNoneif no run was active.run_preflight(plan: FineTunePlan) -> PreflightResult: local-env sanity check (source dir, output dir writability, override bounds).list_runs(*, limit: int, offset: int) -> tuple[tuple[FineTuneRun, ...], int]: paged historical runs + total count.get_active_embedder() -> ActiveEmbedderSnapshot: frozen snapshot of the active provider / model / checkpoint id from settings.rollback_checkpoint(checkpoint_id: NotBlankStr) -> CheckpointRecord: atomic swap of the active embedder back to checkpoint_id (destructive). The rollback-step helper logs a distinctMEMORY_CHECKPOINT_ROLLBACK_FAILEDevent if any intermediate step fails so operators can distinguish partial-rollback from the primary deploy failure.
Destructive entries (cancel_fine_tune, rollback_checkpoint, and delete_checkpoint at the handler layer) are gated by the standard MCP guardrail triple (actor, literal confirm=True, non-blank reason) and emit MCP_ADMIN_OP_EXECUTED with the resolved actor, reason, and target_id (the cancelled run id or the rolled-back / deleted checkpoint id).
FineTunePlan is an MCP-facing Pydantic model (src/synthorg/memory/fine_tune_plan.py) that mirrors the runner's internal FineTuneRequest field-for-field but isolates the public contract from runner internals. A @model_validator rejects parent-directory traversal, backslashes, and Windows drive letters on source_dir / output_dir before the runner's subprocess or container mount could expose the host filesystem.
BackendUnsupportedError routing¶
Fine-tune orchestration is SQLite-backed. On a persistence backend that does not expose fine_tune_runs / fine_tune_checkpoints, the service raises a typed BackendUnsupportedError (domain_code = "not_supported", frozen with __slots__ = ("reason",)) instead of a generic NotImplementedError. MCP handlers catch it and forward through the standard not_supported() envelope, which emits MCP_HANDLER_NOT_IMPLEMENTED at WARNING; distinct from MCP_HANDLER_CAPABILITY_GAP (handler wired, primitive method missing) and MCP_HANDLER_SERVICE_FALLBACK (legacy helper, zero call sites). REST controllers map it to HTTP 501 with the same domain code.
The typed error keeps the "which gap" question resolvable without string-matching exception messages: backend-unsupported is always exactly one error class and one emitted event.