惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
D
Docker
博客园 - 聂微东
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
博客园 - 叶小钗
李成银的技术随笔
Hugging Face - Blog
Hugging Face - Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
大猫的无限游戏
大猫的无限游戏
Jina AI
Jina AI
罗磊的独立博客
小众软件
小众软件
月光博客
月光博客
量子位
雷峰网
雷峰网
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - Franky
The Cloudflare Blog
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
Last Week in AI
Last Week in AI
J
Java Code Geeks
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
宝玉的分享
宝玉的分享
H
Help Net Security
腾讯CDC
T
ThreatConnect
Cyberwarzone
Cyberwarzone
S
Securelist
A
Arctic Wolf
B
Blog
有赞技术团队
有赞技术团队
Y
Y Combinator Blog
Stack Overflow Blog
Stack Overflow Blog
A
About on SuperTechFans
F
Fox-IT International blog
P
Proofpoint News Feed
The Register - Security
The Register - Security
G
GRAHAM CLULEY
C
CXSECURITY Database RSS Feed - CXSecurity.com
阮一峰的网络日志
阮一峰的网络日志
P
Privacy & Cybersecurity Law Blog
美团技术团队
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
Security Latest
Security Latest
F
Full Disclosure
Recent Commits to openclaw:main
Recent Commits to openclaw:main
L
Lohrmann on Cybersecurity

DEV Community

Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up NovelPilot: A Novel Writing Agent Powered by Gemma 4 BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090 Google Just Declared the Chat-Log Interface Dead. Here's What Neural Expressive Actually Signals for Developers. ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC Notes from a Hammock What's Google Antigravity 2.0 ? Here's What the Agent Harness Actually Changes for Developers. Building an E2EE Chat App in Flask - Part 3: Keeping File Uploads Safe Google's Gemini Spark. Here's What It Actually Does for Developers. Microsoft Just Shipped MCP Governance for .NET. Here's What It Actually Enforces. How I Built a Pakistan Internet Speed Test Platform at 16 How to Build a Supervisor Agent Architecture Without Frameworks I Built My Own Corner of the Internet — Here's What It Looks Like How does VuReact compile Vue 3's defineExpose() to React? Neo-VECTR's Rift Ascent Idempotency Keys: The API Safety Net You Probably Aren't Using Building E-Commerce Sites for Niche Products: Technical Lessons from Specialty Outdoor Retailers Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. Why your AI agent loops forever (and how to break the cycle) How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks Pressure-testing Ota on Supabase: from setup prose to executable repo readiness VPC CNI en EKS: cómo dejar de pagar nodos que no usás The Future of Text Analysis: Introducing TechnoHelps Semantic Engine I built a Chrome Extension that saves product images + context directly to Google Drive & Sheets 95+ browser-based dev tools that never touch a server Running Qwen 2.5 Coder 14B Locally in Cursor with Ollama From a 10,000-line OpenSearch export script to a log analysis tool Ghost Bugs Cost $40K: A Neural Debugging Postmortem SECPAC: A Lightweight CLI Tool to Password-Protect Your Environment Variables 🚀 PasteCheck v1.7 + v1.8 — Hints that tell you what to fix, and a nudge panel that tells you where to start 8 Real Ways Developers Make Money in 2026 (Ranked by Effort) I built a free AI-powered Git CLI that writes your commit messages for you sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs OpenLiDARViewer: A Browser-Based LiDAR and Point-Cloud Viewer Local-First Browser Tools: What You Should Not Upload Online Why most freelancers undercharge (and the maths behind fixing it) We built a mahjong dangerous-tile predictor calibrated on 4.97M real hands Building a Chord Progression Generator in the Browser — Music Theory in JS, Sound via Web Audio API tutorial #10: 148 Opens, 0 Replies — How My Forge Cold Email v1 Completely Failed 9 in 10 Docker Compose files skip the basic security flags How to Forward Android SMS to Telegram Automatically I built the first security scanner for MCP servers — here's what I found Building an Interplanetary Quantum Logic Engine in Rust/Ovie From AI Code Generation to AI System Investigation I gave Gemini 3.5 Flash a CVE-fix PR to review. It found another bug in the same file. When I Realized We Were Throwing Away Half Our Engine's Potential TokenJuice and the 20-Minute Cron: Inside OpenHuman’s Aggressive Context-Harvesting Engine CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode Building a semantic search API in Go with Meilisearch April 2026 DigitalOcean Tutorials: Inference Optimization and AI Infrastructure Looking for DTMF transceiver module Moving Beyond "Tribal Software": Why the Singularity Demands the Interplanetary Hybrid Human
The 7-Layer Memory Architecture Behind Modern AI Agents
Mahmoud Zalt · 2026-05-23 · via DEV Community

How do you make an AI agent actually remember?

ai memory

It is the question that inevitably surfaces once an AI system moves out of prototyping and into long-running production. Why does it forget a core constraint after a week? Why does it re-introduce itself every morning? Why does it pick the wrong tool even though it was corrected three days ago?
At Sistava, where you can hire autonomous AI employees, we had to solve this problem to survive. We run a workforce of around 1,000 AI employees in production, operating continuously across live environments for over two months. At this scale, standard context strategies fail. These systems don't get a polite session reset; they face a massive real-world hurdle: facts change over time.
If a user utilizes Gmail today and switches to Outlook next month, an agent needs to track both. It has to know which one is current, exactly when the switch happened, and it cannot act like the old truth is still valid. Standard vector database similarity scores do not understand chronological decay or truth overrides. Mix old and new context, and the agent confidently fabricates or forgets the one detail that mattered.
After extensive runtime experience scaling this workforce, the obvious answer - pick a vector store, dump text chunks in, and hope for the best - completely broke. Memory in a long-running agent isn't a single database. It requires at least seven distinct layers running in parallel across multiple database types.


The Architectural Split (The CoALA Framework)
The academic literature has already recognized these limitations. The seminal CoALA paper (Princeton, 2023) formalized the episodic, semantic, and procedural split from cognitive science for language model agents. It outlines modular components: working memory as a short-term scratchpad, plus long-term episodic for experiences, semantic for facts, and procedural for skills.
In a production environment, each of these layers requires its own write rules, its own lifecycle, and its own read path. They cannot run as a loose stack; they must be isolated so they do not contaminate one another.

  1. Working Memory
    This is the active, per-turn scratchpad holding the immediate plan-so-far, the raw tool output that just came back, or transient chain-of-thought reasoning. It lives entirely within the LLM's native context window or as an in-memory variable in the runtime environment.
    The Production Lesson: Do not let working memory leak. Transient scratchwork must never accidentally flush into long-term storage, or the agent will begin writing unverified thoughts into its historical knowledge base. Enforce a hard wall - working memory has no persistent backing store. It lives, it dies, it is gone.

  2. Conversation Memory
    This tracks the immediate message history so the agent doesn't have to re-derive the active thread context on every turn. Most modern agent frameworks ship a checkpointer that auto-loads thread history from a Postgres backend on invocation.
    The Production Lesson: Run a summarizer middleware that triggers when the live conversation crosses a strict token threshold. It compresses older turns into a single structural system message while keeping the recent tail intact, maintaining a dense, cost-efficient context window.

  3. Episodic Memory
    A time-indexed log of past execution loops, historical runs, and specifically, the failures ("Last Tuesday the webhook timed out, so I routed through the fallback queue"). It provides chronological continuity.
    The Production Lesson: A vector store alone fails here because similarity scoring doesn't understand time. Store raw transcripts alongside LLM-generated execution summaries, keyed explicitly by thread_id and timestamp. Use a background cron job to truncate older episodes to summaries only, rather than forcing the agent to handle eviction at runtime.

sistava.com memory inspection4. Semantic Memory
This stores slow-changing, deterministic facts about the user, the business, or integrated tools ("The core platform is called Atlas", "The manager prefers brief markdown reports"). It is edited in place, never blindly appended.
The Production Lesson: Split this layer into two distinct substrates: a human-editable markdown file (the "Sovereign Notebook") and an LLM-extracted graph. If they disagree, the notebook explicitly wins. This gives operators a clear vector to intervene; if an extracted fact is noisy, a manual entry in the notebook out-votes the graph noise on equal footing.

  1. Knowledge Graph While semantic memory holds raw facts, the knowledge graph maps the structural edges between entities - who did what, which event caused what, or which entity is a duplicate of another. A vector store treats text chunks like isolated islands; a graph database (such as Neo4j, Memgraph, or KuzuDB) connects them. It allows an agent to walk contextually from a specific customer entity straight to the exact email thread where a pricing tier was modified without re-reading thousands of irrelevant chunks. AI Employee knowledge graph at sistava.com

Handling Changing Realities: Temporal Edges
The non-obvious requirement of the graph layer is temporal awareness. To handle shifting user preferences or infrastructure changes over months of runtime, you must stop deleting or overwriting data when state changes.
Instead, every extracted fact in the semantic and graph layers needs a valid_at and invalid_at timestamp:
(User) -[USES_TOOL {valid_at: "2024-01-01", invalid_at: "2026-02-15"}]-> (Gmail)
(User) -[USES_TOOL {valid_at: "2026-02-16", invalid_at: null}]--> (Outlook)
When today's session contradicts yesterday's state, the ingestion pipeline invalidates the old edge instead of erasing it. This preserves a clean, immutable audit trail, allowing the LLM to logically reason about when a preference shifted or an infrastructure stack was updated.
The Build vs. Buy Lesson: Do not write this temporal logic yourself. Utilize open-source libraries that sit on top of your graph DB to handle the LLM-driven extraction, deduplication, and contradiction detection. Writing relationship-inference engines from scratch can easily burn six months of development time.


  1. Procedural Memory
    Procedural memory stores execution mechanics and behavioral habits, not world facts. It dictates how an agent performs tasks ("When checking a raw CSV dataset, first validate header consistency").
    This data lives in structured skill files (typically markdown documents) that the agent loads on demand based on task routing. Some are explicitly authored by engineers; others are written by the agent itself during asynchronous self-reflection steps.
    The Production Lesson: Keep semantic and procedural data separated. A fact like "The client uses Slack" is semantic and belongs in the notebook. A rule like "When notifying via a webhook, format payload fields as snake_case" is procedural and belongs in a skill file.

  2. Checkpoints
    Operating underneath all other layers is a highly serializable, low-latency snapshot of the exact execution state of an agent workflow. This is not thread history; it is the active node in the graph, the pending tool payloads, and the unwritten output stream.
    It is the difference between a background container crashing and losing a forty-minute execution loop, or surviving a pod restart and picking up cleanly at minute thirty-three. Utilizing a durable execution engine like Temporal gives you deterministic checkpointing at every activity boundary out of the box.


Infrastructure Matrix & Preventing Contamination
To maintain performance, these layers require separate storage shapes, read patterns, and write triggers:
LayerStorage ShapeWrite TriggerRead PatternWorkingIn-memory scratchpadPer-turn executionNative context window injectionConversationAppend-only log + summarizerEvery incoming messageAuto-loaded on invocationEpisodicTime-indexed transcript + JSON summariesPost-message background workerRecency-weighted semantic retrievalSemantic (Notebook)Single editable Markdown fileExplicit agent tool writesFull text injected to promptSemantic (Facts)Graph DB (Neo4j class)Auto-extracted post-messageEntity-anchored sub-graph matchingKnowledge GraphGraph DB with temporal propertiesUnified extraction loop with factsContextual edge-walking between nodesProceduralMarkdown skill filesHuman authorship or reflectionDynamically loaded based on taskCheckpointsKV Store / Postgres / Workflow engineEvery single execution stepInstantly restored on worker restart
Preventing Contamination
Naming the layers is straightforward; wiring them without cross-contamination is where production pipelines fail.
Episodic leaking into Semantic: If every line of a historical brainstorming session gets extracted as a hard "fact," the agent will interpret a transient hypothetical idea as absolute truth. Enforce strict LLM confidence thresholds or run your fact extraction pipelines on summarized episodes rather than raw chat logs.
Conversation leaking into the Graph: Active conversation is full of throwaway syntax and short pleasantries. Ingesting every message verbatim fills a graph database with garbage nodes. Enforce length-gated ingestion filters to skip processing short, transactional messages.


Managing Upstream LLM Costs
An advanced knowledge graph ingestion pipeline requires between five to nine discrete LLM calls per message (handling entity extraction, graph deduplication, relationship inference, contradiction testing, and entity summary updates), alongside multiple embedding calls. Multiplied across thousands of active conversations running concurrently, background memory costs can quickly eclipse primary agent execution costs.
To keep this sustainable at scale, bake in kill switches and per-tenant gates from day one. Every layer running unattended in the background must have a configuration-level flag or a feature toggle. When an upstream model update or unexpected schema change causes an extraction loop to degrade or spin out of control, you need a way to stop the financial bleeding instantly without triggering an emergency production redeployment.
The Rebuild Blueprint
If you are starting over building an agent memory infrastructure today, this is the recommended development order:
Map the Concerns First: Do not select an orchestration framework based on hype. Map how your system will handle these seven distinct concerns before writing application logic.
Postgres for Foundations: Use Postgres for conversation history and step-level checkpointing. Boring, ACID-compliant storage is exactly what you want here.
Path-Routed KV for Filesystems: Implement a simple key-value store for notebooks and skill files, allowing the agent to interact with its procedural knowledge using clean, standard filesystem tool calls.
Native Graph + Temporal Constraints: Deploy a native graph database (Neo4j, Memgraph, or KuzuDB) paired with an off-the-shelf library that manages temporal constraints natively.
Tight Vector Tooling: Use a highly optimized vector store (pgvector, Qdrant, or Weaviate) specifically to index static external knowledge documents like Notion workspaces, Slack history, or uploaded manuals.

Ultimately, separating transient reasoning from immutable history and structured relational facts is what transforms a fragile chatbot into a reliable system. By treating memory as a multi-layered infrastructure concern, you build an environment where an agent's capability doesn't degrade over time , it compounds.
For more details continue reading at https://sistava.com/en/insights/ai-agent-memory.
sistava.com knowledge ingestionBuilding Agent Memory Yourself?
The seven layers, the wiring, and the cost ceilings are a lot to get right on the first run.
If you want this exact architecture adapted to your tech stack, check out our support options at Sista AI. If you would rather talk engineer-to-engineer, I take a few of these architectural deep dives personally. You can reach me directly at zalt.me.

sistava knowledge base