GitHub - rishipratap10/memory-guardian

Memory governance for AI agents.

Most agent memory layers are passive storage. Memory Guardian is an active governance layer — it controls what gets stored, what gets retrieved, how memories age, and what contradicts what — with full explainability at every step.

The problem

AI agents with persistent memory break in predictable ways:

Duplicate facts accumulate until retrieval becomes noise
Conflicting preferences coexist with no resolution path
Stale memories from months ago outrank fresh ones
Retrieval is a black box — no way to know why a memory was selected
Quality degrades the longer an agent runs in production

In consumer apps, getting this wrong is a bad experience. In regulated environments — wealth management, KYC, claims, credit — getting this wrong is a suitability breach, a compliance failure, or a mis-sold product.

What Memory Guardian does differently

Capability	Mem0	Zep	Letta	agentmemory	Memory Guardian
Conflict detection	Yes	Yes	Partial	Yes	Yes
Explainable retrieval (`why_selected`)	No	No	No	No	Yes
Retrieval decision trace per stage	No	No	No	No	Yes
Score decay + archival lifecycle	No	Partial	Partial	Yes (local)	Yes
Consolidation of redundant memories	Partial	Partial	No	No	Yes
Pinned memory (survives decay)	No	No	No	No	Yes
NLI conflict detection	No	No	No	No	Yes
Strict multi-tenant isolation at API layer	Yes	Yes	No	No	Yes
Framework-agnostic production API	Yes	Yes	No	No	Yes

Core features

Intelligent ingestion

Every memory write goes through enrichment, deduplication, and type normalisation before persistence. Two dedup paths run in sequence — exact match on normalised content, then vector-similarity match against nearby tenant memories. Noise does not enter the store.

Supported canonical memory types: fact, preference, episodic, instruction, consolidated. Aliases are normalised automatically (pref → preference, episode → episodic, etc.). Unknown types are rejected with HTTP 422.

Task-aware retrieval

Retrieval combines four signals into a final score:

Semantic similarity — vector distance from query embedding
Importance — priority-weighted score from metadata
Recency — exponential decay with 30-day half-life; uses last_accessed_at where available
Frequency — logarithmic reuse score, capped at 20 accesses with a 21-day half-life to prevent runaway dominance

Returns ranked, deduplicated, relevant memories. Archived memories are excluded. Reranker applies after base scoring with deterministic fallback if the provider fails.

Explainable ranking

Every retrieved memory includes:

score_breakdown — per-signal contributions (similarity, importance, recency, frequency)
decision_trace — stage-by-stage trace: candidate generation → scoring → rerank → final order
why_selected — human-readable reasons generated from the actual executed path

This is not post-hoc explanation. The trace reflects what the system actually did.

Conflict detection — three providers

Heuristic (default, zero dependencies) Polarity analysis over shared topic tokens. Detects opposite-sentiment statements about the same subjects — e.g. "user likes X" vs "user hates X".

NLI — Natural Language Inference (optional, local) Uses microsoft/deberta-v3-small-mnli via HuggingFace transformers. Runs a text-classification pipeline on each candidate memory pair. Flags pairs where the contradiction label score exceeds a configurable threshold (default 0.8). Requires requirements-ml.txt. Model downloads on first use.

LLM (optional, OpenAI-compatible) Sends each candidate pair to an LLM with a structured prompt. Returns is_conflict, reason, conflict_type, and shared_topics. Most semantically capable. Works with any OpenAI-compatible endpoint including HuggingFace inference, Ollama, and others.

All providers fall back to heuristic automatically on failure. Memory writes are never rolled back if conflict analysis fails — the write path is failure-safe by design. Conflicts are persisted as first-class objects with canonical pair handling — no duplicate conflict records per pair.

Memory lifecycle

Decay — importance scores reduce over time using configurable heuristic or LLM policy. Pinned memories are always preserved regardless of score.
Archival — inactive low-importance memories are archived automatically. Archived memories are excluded from all retrieval.
Recency signals — last_accessed_at is updated atomically on retrieval. Lifecycle decisions use real access history, not just creation time.

Consolidation

Clusters similar active memories within strict scope boundaries (tenant_id + user_id + agent_id + session_id)
Generates a canonical consolidated memory via configurable summariser (heuristic or LLM)
Archives originals with consolidated_into trace metadata
Prevents recursive consolidation — memories with memory_type=consolidated are excluded from future consolidation input by default

Tenant isolation

X-Tenant-Id header enforced at API boundary — missing or blank returns 401/400
All repository queries scoped by tenant_id
No cross-tenant retrieval, conflict reads, or admin access possible by design
Admin endpoints require X-Is-Admin: true separately

FSI and regulated environment use cases

Memory Guardian's architecture maps directly to the memory governance requirements of regulated industries.

Wealth management suitability A client's stated risk appetite changes over time. Without conflict detection, both the old and new preference coexist with no governed resolution. Memory Guardian persists the contradiction as a first-class conflict object, flags it for review, and surfaces the most recent preference on retrieval with a full decision trace — your explainability record under MiFID II suitability requirements.

KYC / AML profiling Resolved sanctions flags and stale risk classifications should not carry the same weight as fresh transaction signals. Memory Guardian decays soft signals automatically, pins formal risk decisions permanently, and detects when new evidence contradicts a stored classification — triggering review rather than silent coexistence.

Claims handling Contradicting customer statements across multiple touchpoints are stored as conflict pairs with canonical traces — the audit trail needed if a case goes to dispute.

Credit underwriting Stale financial snapshots decay automatically. Covenant breaches and credit committee decisions are pinned permanently. Quarterly updates consolidate into a clean current-state picture.

Trade surveillance Investigated and cleared flags do not dominate current risk profiles. Lifecycle archival retires resolved low-value memories. Every retrieval decision is traceable.

Memory Guardian is early-stage open source software. These use cases reflect the architectural design intent, not production deployments. If you are building agent systems in financial services and recognise these problems, we want to hear from you.

Quickstart

With Docker Compose

git clone https://github.com/your-org/memory-guardian
cd memory-guardian
cp .env.example .env
docker compose up --build -d db
docker compose run --rm migrate
docker compose up --build -d api

Without Docker

# Python 3.12 required
pip install -r requirements-dev.txt

# Optional: NLI conflict detection (downloads DeBERTa model on first use ~500MB)
pip install -r requirements-ml.txt

cp .env.example .env
# Start PostgreSQL with pgvector, then:
alembic upgrade head
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Configuration

Copy .env.example to .env. Key variables:

# Environment
ENVIRONMENT=local
ALLOW_LOCAL_STUB_AUTH=true

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/memory_guardian

# Embedding
# Set EMBEDDING_DIMENSION to match your model's actual output size exactly.
# HuggingFace free models typically output 64–384 dimensions.
# OpenAI text-embedding-3-small outputs up to 1536.
EMBEDDING_PROVIDER=llm
EMBEDDING_DIMENSION=64
LLM_EMBEDDING_API_TOKEN=...
LLM_EMBEDDING_BASE_URL=...      # HuggingFace inference endpoint or OpenAI-compatible URL
LLM_EMBEDDING_MODEL=...

# Conflict detection
CONFLICT_DETECTION_PROVIDER=llm  # llm | nli | heuristic
# NLI requires: pip install -r requirements-ml.txt

# Other providers
RERANKER_PROVIDER=llm
SUMMARIZER_PROVIDER=llm
INGESTION_ENRICHER_PROVIDER=llm
DEDUPE_PROVIDER=heuristic
MAINTENANCE_POLICY_PROVIDER=heuristic
CONSOLIDATION_QUALITY_PROVIDER=llm

# LLM API (OpenAI-compatible — works with HuggingFace, Ollama, OpenAI, etc.)
LLM_API_KEY=...
LLM_BASE_URL=...

Provider options and fallback

Every provider falls back to heuristic automatically on startup or request failure. The system never crashes on provider failure.

Variable	Default	Options
`EMBEDDING_PROVIDER`	`llm`	`llm`, `heuristic` (tests only)
`CONFLICT_DETECTION_PROVIDER`	`heuristic`	`heuristic`, `nli`, `llm`
`RERANKER_PROVIDER`	`heuristic`	`heuristic`, `llm`
`INGESTION_ENRICHER_PROVIDER`	`heuristic`	`heuristic`, `llm`
`DEDUPE_PROVIDER`	`off`	`off`, `heuristic`, `llm`
`MAINTENANCE_POLICY_PROVIDER`	`heuristic`	`heuristic`, `llm`
`SUMMARIZER_PROVIDER`	`heuristic`	`heuristic`, `llm`
`CONSOLIDATION_QUALITY_PROVIDER`	`heuristic`	`heuristic`, `llm`

Embedding dimension note

EMBEDDING_DIMENSION must match your active model's output size exactly. Retrieval and dedup only compare vectors with matching dimensions — dimension mismatch is enforced at query time. You can switch models and dimensions without breaking existing rows; old vectors simply won't be compared against new-dimension queries.

API surface

GET  /health

POST /api/v1/memories                         Store a memory
POST /api/v1/retrieve                         Retrieve memories for a query
GET  /api/v1/conflicts                        List conflicts for a user
GET  /api/v1/memories/{memory_id}/explain     Full explain payload

POST /api/v1/maintenance/decay                Run decay pass (admin)
POST /api/v1/maintenance/consolidate          Run consolidation pass (admin)

GET  /api/v1/admin/memories                   Memory explorer (admin, filterable, paginated)
GET  /api/v1/admin/memories/stats             Aggregate stats (admin)
GET  /api/v1/admin/memories/{memory_id}       Single memory with explain (admin)
GET  /api/v1/admin/conflicts                  Conflict list (admin, filterable, paginated)

Full OpenAPI spec at /docs (Swagger UI) or /openapi.json when running. Export committed spec:

python scripts/export_swagger_yaml.py

Example: store a memory

curl -X POST http://localhost:8000/api/v1/memories \
  -H "Content-Type: application/json" \
  -H "X-Subject: user-001" \
  -H "X-Tenant-Id: tenant-finance" \
  -d '{
    "user_id": "user-001",
    "content": "Client stated conservative risk appetite, no equities",
    "memory_type": "preference",
    "metadata": { "priority": "high", "pinned": true }
  }'

Example: retrieve with explanation

curl -X POST http://localhost:8000/api/v1/retrieve \
  -H "Content-Type: application/json" \
  -H "X-Subject: user-001" \
  -H "X-Tenant-Id: tenant-finance" \
  -d '{
    "user_id": "user-001",
    "query": "What is this client'\''s risk profile?",
    "top_k": 5
  }'

Every returned memory includes why_selected and score_breakdown.

Auth

Auth is fail-closed:

ENVIRONMENT defaults to production
Stub auth only active when ENVIRONMENT=local and ALLOW_LOCAL_STUB_AUTH=true
In local stub mode: X-Subject and X-Tenant-Id are both required
Admin endpoints require X-Is-Admin: true
Production use requires integrating a real JWT/OIDC provider

Architecture

app/api           HTTP routing, request/response, dependency injection
app/schemas       Pydantic validation
app/services      Domain workflows — ingestion, retrieval, conflict, lifecycle, consolidation
app/repositories  DB query layer, tenant-scoped
app/models        SQLAlchemy ORM models
app/core          Config, logging, exception contracts
alembic/          Schema migrations
tests/            Unit + integration coverage
admin_ui/         Streamlit memory inspector (optional)

Stack: FastAPI · SQLAlchemy 2.x async · PostgreSQL + pgvector · Alembic · Pydantic v2 · Uvicorn · Docker

Data model

Table	Purpose
`memories`	Tenant-scoped records with embeddings, importance score, reuse/access metadata
`memory_conflicts`	Contradiction links between pairs — canonical pair uniqueness enforced
`retrieval_logs`	Per-query retrieval outcomes and explanation payloads

Testing

pytest -q                    # all tests
pytest -q tests/bdd          # BDD acceptance scenarios
pytest -q tests/technical    # API contract tests
pytest -q -m bdd
pytest -q -m technical

Integration tests use ephemeral PostgreSQL + pgvector containers. Docker daemon required.

E2E simulation

python scripts/run_agent_e2e_simulation.py \
  --base-url http://127.0.0.1:8000 \
  --api-prefix /api/v1

Exercises ingestion, dedupe, retrieval explainability, tenant isolation, conflicts, maintenance, consolidation, and explain. Fails on any 5xx response.

Finance advisor test agent

# Deterministic mode
python scripts/run_finance_test_agent.py \
  --base-url http://127.0.0.1:8000 \
  --tenant-id finance-demo \
  --subject finance-test-agent \
  --user-message "I want to save £30,000 for a house deposit by next year."

# LLM mode
LLM_API_KEY=... python scripts/run_finance_test_agent.py \
  --mode llm \
  --llm-model gpt-4.1-mini \
  --user-message "I filed my tax return and signed a £12,000 car loan."

# Persist extracted memories back to Memory Guardian
python scripts/run_finance_test_agent.py \
  --user-message "I prefer low-risk index funds and avoid credit card debt." \
  --persist

Finance scenario matrix

python scripts/run_finance_test_agent_scenarios.py \
  --base-url http://127.0.0.1:8000 \
  --output-json /tmp/finance_scenario_report.json

Covers: health and auth contracts, memory persistence, contradiction detection, dedupe, pinned memory, tenant isolation, admin access control, invalid payload rejection.

Memory Inspector (Admin UI)

A Streamlit console for retrieval debugging, score trace inspection, conflict review, and lifecycle monitoring.

pip install -r admin_ui/requirements.txt

# .env
ADMIN_UI_ENABLED=true
ADMIN_UI_PASSWORD=<your_password>
MEMORY_GUARDIAN_API_BASE_URL=http://127.0.0.1:8000

streamlit run admin_ui/app.py
# Open http://127.0.0.1:8501

Status

MVP backend — ready for integration and pilot use.

Memory ingestion with enrichment and deduplication
Task-aware retrieval with multi-signal scoring
Explainable retrieval — why_selected and per-stage decision trace
Conflict detection — heuristic, NLI (DeBERTa via HuggingFace), and LLM providers
Failure-safe write path — memory writes never rolled back on conflict analysis failure
Score decay and archival lifecycle
Pinned memory — preserved through decay and archival
Consolidation with recursive prevention and trace metadata
Strict multi-tenant isolation at API and repository layers
Provider resilience — all providers fall back to heuristic on failure
Admin UI — Memory Inspector (Streamlit)
BDD and integration test coverage
CI — lint, tests, migration check, secret scan
Production JWT/OIDC auth provider
Distributed scheduling for maintenance jobs
End-user product frontend

Contributing

See CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md.

Before tagging a release:

bash scripts/release_preflight.sh

License

Apache 2.0

Copyright 2026 Memory Guardian Contributors. Licensed under the Apache License, Version 2.0. You may use, modify, and distribute this software freely. Any modifications must carry prominent notices. Patent rights are granted — and terminate automatically if you initiate patent litigation against the project.

推荐订阅源

Hacker News - Newest: "AI"