Memory governance for AI agents.
Most agent memory layers are passive storage. Memory Guardian is an active governance layer — it controls what gets stored, what gets retrieved, how memories age, and what contradicts what — with full explainability at every step.
The problem
AI agents with persistent memory break in predictable ways:
- Duplicate facts accumulate until retrieval becomes noise
- Conflicting preferences coexist with no resolution path
- Stale memories from months ago outrank fresh ones
- Retrieval is a black box — no way to know why a memory was selected
- Quality degrades the longer an agent runs in production
In consumer apps, getting this wrong is a bad experience. In regulated environments — wealth management, KYC, claims, credit — getting this wrong is a suitability breach, a compliance failure, or a mis-sold product.
What Memory Guardian does differently
| Capability | Mem0 | Zep | Letta | agentmemory | Memory Guardian |
|---|---|---|---|---|---|
| Conflict detection | Yes | Yes | Partial | Yes | Yes |
Explainable retrieval (why_selected) |
No | No | No | No | Yes |
| Retrieval decision trace per stage | No | No | No | No | Yes |
| Score decay + archival lifecycle | No | Partial | Partial | Yes (local) | Yes |
| Consolidation of redundant memories | Partial | Partial | No | No | Yes |
| Pinned memory (survives decay) | No | No | No | No | Yes |
| NLI conflict detection | No | No | No | No | Yes |
| Strict multi-tenant isolation at API layer | Yes | Yes | No | No | Yes |
| Framework-agnostic production API | Yes | Yes | No | No | Yes |
Core features
Intelligent ingestion
Every memory write goes through enrichment, deduplication, and type normalisation before persistence. Two dedup paths run in sequence — exact match on normalised content, then vector-similarity match against nearby tenant memories. Noise does not enter the store.
Supported canonical memory types: fact, preference, episodic, instruction, consolidated. Aliases are normalised automatically (pref → preference, episode → episodic, etc.). Unknown types are rejected with HTTP 422.
Task-aware retrieval
Retrieval combines four signals into a final score:
- Semantic similarity — vector distance from query embedding
- Importance — priority-weighted score from metadata
- Recency — exponential decay with 30-day half-life; uses
last_accessed_atwhere available - Frequency — logarithmic reuse score, capped at 20 accesses with a 21-day half-life to prevent runaway dominance
Returns ranked, deduplicated, relevant memories. Archived memories are excluded. Reranker applies after base scoring with deterministic fallback if the provider fails.
Explainable ranking
Every retrieved memory includes:
score_breakdown— per-signal contributions (similarity, importance, recency, frequency)decision_trace— stage-by-stage trace: candidate generation → scoring → rerank → final orderwhy_selected— human-readable reasons generated from the actual executed path
This is not post-hoc explanation. The trace reflects what the system actually did.
Conflict detection — three providers
Heuristic (default, zero dependencies) Polarity analysis over shared topic tokens. Detects opposite-sentiment statements about the same subjects — e.g. "user likes X" vs "user hates X".
NLI — Natural Language Inference (optional, local)
Uses microsoft/deberta-v3-small-mnli via HuggingFace transformers. Runs a text-classification pipeline on each candidate memory pair. Flags pairs where the contradiction label score exceeds a configurable threshold (default 0.8). Requires requirements-ml.txt. Model downloads on first use.
LLM (optional, OpenAI-compatible)
Sends each candidate pair to an LLM with a structured prompt. Returns is_conflict, reason, conflict_type, and shared_topics. Most semantically capable. Works with any OpenAI-compatible endpoint including HuggingFace inference, Ollama, and others.
All providers fall back to heuristic automatically on failure. Memory writes are never rolled back if conflict analysis fails — the write path is failure-safe by design. Conflicts are persisted as first-class objects with canonical pair handling — no duplicate conflict records per pair.
Memory lifecycle
- Decay — importance scores reduce over time using configurable heuristic or LLM policy. Pinned memories are always preserved regardless of score.
- Archival — inactive low-importance memories are archived automatically. Archived memories are excluded from all retrieval.
- Recency signals —
last_accessed_atis updated atomically on retrieval. Lifecycle decisions use real access history, not just creation time.
Consolidation
- Clusters similar active memories within strict scope boundaries (
tenant_id + user_id + agent_id + session_id) - Generates a canonical
consolidatedmemory via configurable summariser (heuristic or LLM) - Archives originals with
consolidated_intotrace metadata - Prevents recursive consolidation — memories with
memory_type=consolidatedare excluded from future consolidation input by default
Tenant isolation
X-Tenant-Idheader enforced at API boundary — missing or blank returns 401/400- All repository queries scoped by
tenant_id - No cross-tenant retrieval, conflict reads, or admin access possible by design
- Admin endpoints require
X-Is-Admin: trueseparately
FSI and regulated environment use cases
Memory Guardian's architecture maps directly to the memory governance requirements of regulated industries.
Wealth management suitability A client's stated risk appetite changes over time. Without conflict detection, both the old and new preference coexist with no governed resolution. Memory Guardian persists the contradiction as a first-class conflict object, flags it for review, and surfaces the most recent preference on retrieval with a full decision trace — your explainability record under MiFID II suitability requirements.
KYC / AML profiling Resolved sanctions flags and stale risk classifications should not carry the same weight as fresh transaction signals. Memory Guardian decays soft signals automatically, pins formal risk decisions permanently, and detects when new evidence contradicts a stored classification — triggering review rather than silent coexistence.
Claims handling Contradicting customer statements across multiple touchpoints are stored as conflict pairs with canonical traces — the audit trail needed if a case goes to dispute.
Credit underwriting Stale financial snapshots decay automatically. Covenant breaches and credit committee decisions are pinned permanently. Quarterly updates consolidate into a clean current-state picture.
Trade surveillance Investigated and cleared flags do not dominate current risk profiles. Lifecycle archival retires resolved low-value memories. Every retrieval decision is traceable.
Memory Guardian is early-stage open source software. These use cases reflect the architectural design intent, not production deployments. If you are building agent systems in financial services and recognise these problems, we want to hear from you.
Quickstart
With Docker Compose
git clone https://github.com/your-org/memory-guardian
cd memory-guardian
cp .env.example .env
docker compose up --build -d db
docker compose run --rm migrate
docker compose up --build -d apiWithout Docker
# Python 3.12 required pip install -r requirements-dev.txt # Optional: NLI conflict detection (downloads DeBERTa model on first use ~500MB) pip install -r requirements-ml.txt cp .env.example .env # Start PostgreSQL with pgvector, then: alembic upgrade head uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Configuration
Copy .env.example to .env. Key variables:
# Environment ENVIRONMENT=local ALLOW_LOCAL_STUB_AUTH=true # Database DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/memory_guardian # Embedding # Set EMBEDDING_DIMENSION to match your model's actual output size exactly. # HuggingFace free models typically output 64–384 dimensions. # OpenAI text-embedding-3-small outputs up to 1536. EMBEDDING_PROVIDER=llm EMBEDDING_DIMENSION=64 LLM_EMBEDDING_API_TOKEN=... LLM_EMBEDDING_BASE_URL=... # HuggingFace inference endpoint or OpenAI-compatible URL LLM_EMBEDDING_MODEL=... # Conflict detection CONFLICT_DETECTION_PROVIDER=llm # llm | nli | heuristic # NLI requires: pip install -r requirements-ml.txt # Other providers RERANKER_PROVIDER=llm SUMMARIZER_PROVIDER=llm INGESTION_ENRICHER_PROVIDER=llm DEDUPE_PROVIDER=heuristic MAINTENANCE_POLICY_PROVIDER=heuristic CONSOLIDATION_QUALITY_PROVIDER=llm # LLM API (OpenAI-compatible — works with HuggingFace, Ollama, OpenAI, etc.) LLM_API_KEY=... LLM_BASE_URL=...
Provider options and fallback
Every provider falls back to heuristic automatically on startup or request failure. The system never crashes on provider failure.
| Variable | Default | Options |
|---|---|---|
EMBEDDING_PROVIDER |
llm |
llm, heuristic (tests only) |
CONFLICT_DETECTION_PROVIDER |
heuristic |
heuristic, nli, llm |
RERANKER_PROVIDER |
heuristic |
heuristic, llm |
INGESTION_ENRICHER_PROVIDER |
heuristic |
heuristic, llm |
DEDUPE_PROVIDER |
off |
off, heuristic, llm |
MAINTENANCE_POLICY_PROVIDER |
heuristic |
heuristic, llm |
SUMMARIZER_PROVIDER |
heuristic |
heuristic, llm |
CONSOLIDATION_QUALITY_PROVIDER |
heuristic |
heuristic, llm |
Embedding dimension note
EMBEDDING_DIMENSION must match your active model's output size exactly. Retrieval and dedup only compare vectors with matching dimensions — dimension mismatch is enforced at query time. You can switch models and dimensions without breaking existing rows; old vectors simply won't be compared against new-dimension queries.
API surface
GET /health
POST /api/v1/memories Store a memory
POST /api/v1/retrieve Retrieve memories for a query
GET /api/v1/conflicts List conflicts for a user
GET /api/v1/memories/{memory_id}/explain Full explain payload
POST /api/v1/maintenance/decay Run decay pass (admin)
POST /api/v1/maintenance/consolidate Run consolidation pass (admin)
GET /api/v1/admin/memories Memory explorer (admin, filterable, paginated)
GET /api/v1/admin/memories/stats Aggregate stats (admin)
GET /api/v1/admin/memories/{memory_id} Single memory with explain (admin)
GET /api/v1/admin/conflicts Conflict list (admin, filterable, paginated)
Full OpenAPI spec at /docs (Swagger UI) or /openapi.json when running. Export committed spec:
python scripts/export_swagger_yaml.py
Example: store a memory
curl -X POST http://localhost:8000/api/v1/memories \ -H "Content-Type: application/json" \ -H "X-Subject: user-001" \ -H "X-Tenant-Id: tenant-finance" \ -d '{ "user_id": "user-001", "content": "Client stated conservative risk appetite, no equities", "memory_type": "preference", "metadata": { "priority": "high", "pinned": true } }'
Example: retrieve with explanation
curl -X POST http://localhost:8000/api/v1/retrieve \ -H "Content-Type: application/json" \ -H "X-Subject: user-001" \ -H "X-Tenant-Id: tenant-finance" \ -d '{ "user_id": "user-001", "query": "What is this client'\''s risk profile?", "top_k": 5 }'
Every returned memory includes why_selected and score_breakdown.
Auth
Auth is fail-closed:
ENVIRONMENTdefaults toproduction- Stub auth only active when
ENVIRONMENT=localandALLOW_LOCAL_STUB_AUTH=true - In local stub mode:
X-SubjectandX-Tenant-Idare both required - Admin endpoints require
X-Is-Admin: true - Production use requires integrating a real JWT/OIDC provider
Architecture
app/api HTTP routing, request/response, dependency injection
app/schemas Pydantic validation
app/services Domain workflows — ingestion, retrieval, conflict, lifecycle, consolidation
app/repositories DB query layer, tenant-scoped
app/models SQLAlchemy ORM models
app/core Config, logging, exception contracts
alembic/ Schema migrations
tests/ Unit + integration coverage
admin_ui/ Streamlit memory inspector (optional)
Stack: FastAPI · SQLAlchemy 2.x async · PostgreSQL + pgvector · Alembic · Pydantic v2 · Uvicorn · Docker
Data model
| Table | Purpose |
|---|---|
memories |
Tenant-scoped records with embeddings, importance score, reuse/access metadata |
memory_conflicts |
Contradiction links between pairs — canonical pair uniqueness enforced |
retrieval_logs |
Per-query retrieval outcomes and explanation payloads |
Testing
pytest -q # all tests pytest -q tests/bdd # BDD acceptance scenarios pytest -q tests/technical # API contract tests pytest -q -m bdd pytest -q -m technical
Integration tests use ephemeral PostgreSQL + pgvector containers. Docker daemon required.
E2E simulation
python scripts/run_agent_e2e_simulation.py \ --base-url http://127.0.0.1:8000 \ --api-prefix /api/v1
Exercises ingestion, dedupe, retrieval explainability, tenant isolation, conflicts, maintenance, consolidation, and explain. Fails on any 5xx response.
Finance advisor test agent
# Deterministic mode python scripts/run_finance_test_agent.py \ --base-url http://127.0.0.1:8000 \ --tenant-id finance-demo \ --subject finance-test-agent \ --user-message "I want to save £30,000 for a house deposit by next year." # LLM mode LLM_API_KEY=... python scripts/run_finance_test_agent.py \ --mode llm \ --llm-model gpt-4.1-mini \ --user-message "I filed my tax return and signed a £12,000 car loan." # Persist extracted memories back to Memory Guardian python scripts/run_finance_test_agent.py \ --user-message "I prefer low-risk index funds and avoid credit card debt." \ --persist
Finance scenario matrix
python scripts/run_finance_test_agent_scenarios.py \ --base-url http://127.0.0.1:8000 \ --output-json /tmp/finance_scenario_report.json
Covers: health and auth contracts, memory persistence, contradiction detection, dedupe, pinned memory, tenant isolation, admin access control, invalid payload rejection.
Memory Inspector (Admin UI)
A Streamlit console for retrieval debugging, score trace inspection, conflict review, and lifecycle monitoring.
pip install -r admin_ui/requirements.txt # .env ADMIN_UI_ENABLED=true ADMIN_UI_PASSWORD=<your_password> MEMORY_GUARDIAN_API_BASE_URL=http://127.0.0.1:8000 streamlit run admin_ui/app.py # Open http://127.0.0.1:8501
Status
MVP backend — ready for integration and pilot use.
- Memory ingestion with enrichment and deduplication
- Task-aware retrieval with multi-signal scoring
- Explainable retrieval —
why_selectedand per-stage decision trace - Conflict detection — heuristic, NLI (DeBERTa via HuggingFace), and LLM providers
- Failure-safe write path — memory writes never rolled back on conflict analysis failure
- Score decay and archival lifecycle
- Pinned memory — preserved through decay and archival
- Consolidation with recursive prevention and trace metadata
- Strict multi-tenant isolation at API and repository layers
- Provider resilience — all providers fall back to heuristic on failure
- Admin UI — Memory Inspector (Streamlit)
- BDD and integration test coverage
- CI — lint, tests, migration check, secret scan
- Production JWT/OIDC auth provider
- Distributed scheduling for maintenance jobs
- End-user product frontend
Contributing
See CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md.
Before tagging a release:
bash scripts/release_preflight.sh
License
Copyright 2026 Memory Guardian Contributors. Licensed under the Apache License, Version 2.0. You may use, modify, and distribute this software freely. Any modifications must carry prominent notices. Patent rights are granted — and terminate automatically if you initiate patent litigation against the project.

























