惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
V2EX - 技术
V2EX - 技术
The Register - Security
The Register - Security
H
Help Net Security
S
SegmentFault 最新的问题
宝玉的分享
宝玉的分享
Recorded Future
Recorded Future
GbyAI
GbyAI
Recent Announcements
Recent Announcements
T
Tailwind CSS Blog
MyScale Blog
MyScale Blog
L
LangChain Blog
D
DataBreaches.Net
M
MIT News - Artificial intelligence
雷峰网
雷峰网
WordPress大学
WordPress大学
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
Apple Machine Learning Research
Apple Machine Learning Research
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 司徒正美
C
Check Point Blog
T
The Blog of Author Tim Ferriss
F
Fortinet All Blogs
Microsoft Security Blog
Microsoft Security Blog
T
The Exploit Database - CXSecurity.com
G
Google Developers Blog
博客园 - 聂微东
MongoDB | Blog
MongoDB | Blog
Blog — PlanetScale
Blog — PlanetScale
D
Darknet – Hacking Tools, Hacker News & Cyber Security
P
Palo Alto Networks Blog
有赞技术团队
有赞技术团队
Attack and Defense Labs
Attack and Defense Labs
N
News | PayPal Newsroom
V
V2EX
T
Troy Hunt's Blog
N
News and Events Feed by Topic
The GitHub Blog
The GitHub Blog
Webroot Blog
Webroot Blog
The Hacker News
The Hacker News
I
InfoQ
L
LINUX DO - 最新话题
AWS News Blog
AWS News Blog
美团技术团队
博客园 - 叶小钗
SecWiki News
SecWiki News
G
GRAHAM CLULEY
Vercel News
Vercel News
A
About on SuperTechFans

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
I Gave My Agent Persistent Memory. It Remembered the Wrong 3 Things for a Week.
Ken Imoto · 2026-06-22 · via DEV Community

I gave my coding agent persistent memory in March. By the end of the week it was telling me, with total confidence, that I preferred Poetry for dependency management, that our staging database lived in us-east-1, and that I had already approved a migration plan I had explicitly rejected two days earlier.

All three were wrong. I use uv. Staging moved to ap-northeast-1 back in February. And I never approved that plan.

The annoying part is that none of these were hallucinations in the usual sense. The agent wasn't making things up on the spot. It was faithfully recalling facts it had written down about me at some point and never corrected. The memory worked exactly as designed. That was the problem.

The pitch everyone believes

The standard story about agent memory goes like this. Every session starts with a blank context window. The agent forgets you the moment the conversation ends. So you give it persistent memory: a place to write down your preferences, your project state, the decisions you've made. Next session, it reads those notes first and picks up where it left off.

Anthropic shipped exactly this for all Claude users in March 2026. Claude now scans your history, synthesizes a summary of who you are and what you're working on, and refreshes it roughly every 24 hours. Letta (the framework formerly known as MemGPT) goes further: the agent edits its own memory blocks through tool calls, deciding what's worth keeping.

The selling point is continuity. No more re-explaining your stack every morning. And for the first few days, it genuinely felt like working with someone who remembered me.

Then it started remembering things that were no longer true.

Stale memory is worse than no memory

Here's the uncomfortable asymmetry. An agent with no memory asks you the same question every day. That's annoying, but it's honest. You answer, it acts, nothing rots.

An agent with persistent memory answers the question itself, using a fact it learned three weeks ago. If that fact has changed, you don't get a question. You get a confident wrong action. And because the answer sounds informed, you're less likely to catch it.

My us-east-1 bug is the clearest example. The agent had recorded our staging region back when it was true. We migrated. Nobody told the memory. So for a week the agent kept generating deploy commands pointed at a region with nothing in it, and every command looked perfectly reasonable because the region string was a real region we had genuinely used.

Diagram contrasting an agent with no memory, which asks the user a question, against an agent with stale persistent memory, which confidently acts on an outdated fact

This is the part the memory pitch skips. "Remembering" and "remembering correctly" are different features, and persistence only gives you the first one for free.

The three ways memory goes bad

After staring at my agent's memory file for an embarrassingly long evening, I sorted the failures into three buckets. They're not exotic. They show up the moment a memory system runs longer than a few days.

Stale facts. Something that was true when it was written and isn't anymore. Regions, versions, deadlines, who owns which service. The world moves; the note doesn't. This was most of my pain.

Poisoned facts. A wrong fact gets written once and then quoted forever. The "I approved the migration" entry came from a single ambiguous message where I said "yeah that approach makes sense" about the general shape of a plan. The agent compressed that into approval, wrote it down, and from then on treated it as settled history. No amount of arguing in later sessions dislodged it, because it kept reloading the poisoned note at the start of each one.

Over-confident summaries. The Poetry thing was this. I'd mentioned Poetry once, months ago, in the context of a different repo. The summarization pass that builds the daily profile flattened "used Poetry on one old project" into "prefers Poetry." Summaries are lossy by design, and the loss tends to drift toward overconfident generalizations.

The first one is a freshness problem. The second is an integrity problem. The third is a compression problem. Lumping them together as "the agent got confused" is exactly why they're hard to fix.

Why self-editing memory doesn't save you

The obvious response is: let the agent manage its own memory. It edits, it prunes, it corrects. Letta is built on this idea, and it's a genuinely good idea.

But teams moving Letta from prototype to production keep hitting the same wall, and I hit it too: self-editing memory is unpredictable. The agent decides what to keep based on the same flawed judgment that wrote the bad fact in the first place. When I corrected the region in a session, the agent sometimes updated the memory block, sometimes wrote a second entry that contradicted the first, and once helpfully "consolidated" both into a summary that kept the wrong one. Letta even shipped a Recovery-Bench benchmark in 2026 specifically to measure how well agents climb out of corrupted states, which tells you the industry knows this is real.

The deeper issue: an agent editing its own memory has no external source of truth to check against. It's grading its own homework. If it believed us-east-1 yesterday, "us-east-1" looks consistent with everything it knows today.

I learned this lesson once before, the hard way, with a junior engineer I onboarded years ago. Brilliant, fast, and absolutely certain about a deployment process he'd learned on his first day. The process had changed in month two. He kept doing it the old way for weeks, confidently, because nobody handed him a reason to doubt his own notes. Persistent memory gave my agent the exact same failure mode, minus the part where a human eventually overhears the mistake at lunch.

What actually started working

I'm not going to pretend I solved this. But three changes cut the wrong-fact rate to something I can live with.

Timestamp everything, and decay it. Every memory entry now carries when it was written. Facts about volatile things (regions, versions, deadlines) get treated as suspect after a set window and re-confirmed rather than trusted. A region string from three weeks ago isn't a fact; it's a hypothesis with an expiry date.

Diagram showing the same memory entry treated as a trusted fact when fresh and as a hypothesis requiring re-confirmation once past its freshness window

Separate "observed" from "inferred." The Poetry disaster came from the agent storing a generalization as if it were a stated preference. Now there's a hard line: things I literally said go in one bucket, things the agent concluded go in another, and the inferred bucket needs more evidence before it gets to drive an action. Augment's framing stuck with me here: memory should guide decisions, but never be treated as infallible truth.

Make corrections destructive. When I correct a fact, the old entry doesn't get a polite contradicting neighbor. It gets overwritten and logged. The audit log matters more than I expected. The first time the agent confidently cited a fact I didn't recognize, being able to see when and from which message it was written turned a mystery into a one-line fix.

None of this is exotic. It's basically the discipline you'd apply to any cache: TTLs, write provenance, explicit invalidation. We just forgot to apply it to memory because the word "memory" makes it sound like something more trustworthy than a cache. It isn't. It's a cache that talks back.

The uncomfortable takeaway

Persistent memory doesn't make an agent reliable. It makes an agent consistent, which is a different thing, and occasionally the opposite one. A consistent agent repeats yesterday's truth whether or not it's still true today.

The fix isn't more memory. It's treating every remembered fact as a claim with a timestamp and a source, not as ground truth. The agents that stay useful over weeks aren't the ones that remember the most. They're the ones that know which of their memories to distrust.

If you want to go deeper on how context, state, and memory actually interact in production agents, I wrote about the full picture in Context Engineering.

My agent still remembers the wrong region sometimes. But now it asks before it deploys there. That's the whole game.