惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

DEV Community

Home-Bottom Row Modifier Clusters We Trusted Auto-Ack. The Queue Agreed. Our Costs Didn't. DevOps for Developers: Reducing Cognitive Load and Boosting Transparency Python pytest: Write Tests That Actually Help You How I bypassed Vercel Serverless timeouts to build a decoupled document ingestion pipeline The Case for a Dedicated Reliability Engineer Next.js SaaS Boilerplate with BetterAuth, RBAC, i18n & Production-Ready Setup Reverse Engineer Any Database into dbdiagram.io, PlantUML, Mermaid, or QuickDBD - Then Keep Designing I built a free streaming site from scratch — no ads, no framework, no BS I Can't Believe This AI Agent Runs on a $5 VPS — And It Puts $99/Month Frameworks to Shame Beyond Static Prompts: How to Build Self-Improving AI Agents with Closed-Loop Skill Playbooks How I Taught My Incident Alerts to Say "This Broke 3 Minutes After Your Last Deploy" Why I Stopped Treating Job Applications as My Only Career Strategy Stop Watching Tutorials, Start Coding: How I Built CodeQuizz, an AI-Powered Active Learning Engine How We Generate 300+ AI Business Ideas a Month With GPT-5 (and Filter the Junk Out) The Intent Layer Your AI Coding Agent Does Not Need a Bigger Prompt How I solved a problem in my house using with an AI-powered application! Structure: A Local-First Interview IDE Powered by Gemma 4 Build in public, month 2: 615 of 616 visitors never clicked anything Someone wrote a fake EULA into Bitcoin. Two hours later they revoked it. Insights of Git ( part : 1 ) Someone wrote a fake EULA into Bitcoin. Two hours later they revoked it. Payload CMS Has 508 Circular Dependencies. Next.js Has 17. Here's Why They Form in Every Large JS Codebase. Prompt Packs Are Dead. Long Live Skills Why I Started Building a Portfolio Tracker Senior developer" after 3 years is title laundering Stripe Webhook Idempotency in FastAPI: Handling Duplicate Events Without Double-Charging SaaS Customers What Happens Before Your C Program Reaches the CPU? FinOps for Startups: How to Keep Your AWS Bill Under $100/Month Configuring CORS in Azure API Management How RBI Quietly Created a New Billion Dollar Industry in International Payments Time Need To Rearrange Binary String I Updated My GitHub Auto-Commit Desktop App I Have Reviewed Over 400 Resumes for Tech roles. Here Is What Actually Gets You the Phone Screen [Boost] Awesomeness! We built a lightweight, 100% local File Integrity Monitor (FIM) with zero telemetry Building chart() for Tala: From Raw Indicator Data to Something You Can Actually Inspect A client-side secret scanner that physically can't exfiltrate your code (and why you shouldn't trust mine either) Your AI Agent Should Text You First Built free app for game design and worldbuilding You Have a Free AI Model Sitting in Chrome Right Now I created a fork of GunDB and rewrote it in TypeScript using Vibe Code 6 Advanced JavaScript Questions That Separate Seniors from Mid-Levels Claude Does Not Need More Prompts. It Needs Reasoning Discipline. An Introduction to AI Hub, Part 2: Custom MCP Servers I built a RAG pipeline from scratch — no LangChain, just FastAPI + FAISS How I built a dependency risk scanner with Coral in 7 days Local-first: a Model on Your Own Machine, Zero Cloud 2487. Remove Nodes From Linked List C_STD : A Leak-Free, Cross-Platform Standard Library for Modern C How to build your professional network as a developer — authentic strategies The Pope and the Dynamo Building ShouldWeAutomate: A Decision Intelligence Platform for Workflow Automation The Reputation Layer: Why Developers Quietly Run Corporate PR The Last Mile of Software Is a Sentence AppView 1.0.0 Released: Instrument and Secure Your LLM Deployments The Hermes Rescue: How an Open Agent Rebuilt My GitHub Projects from Scratch S2 — Heap Corruption Crashes: How to Diagnose and Fix Them I built a Chrome extension because I couldn't stop opening Twitter between Pomodoro sessions AI cheating in technical interviews is invisible to interviewers — here's how we detect it Lean4 Might Be the Missing Piece in AI: Why Theorem Provers Are Suddenly Everywhere The Zero-Drift API Series: Stop Trusting a Green Build You Can't Explain How I Deployed My First Project on AWS (And Didn't Break Everything) How I Built a Real-Time Quiz Platform with Next.js, WebSockets, and Learning Science When Your VPS Blocks Outbound SMTP: What Actually Helps Los agentes de código necesitan memoria durable, no solo contexto Cognitive Architectures of AGI: 7 Patterns That Transform LLMs from Oracles into Thinkers I Built a Chat App That Deletes Itself (Because I Was Bored at 2am) Uncovering the Power of Linux's History Command How to Add a Contact Form to Your Ghost Blog Accept Payments in Minutes with Afriex Checkout Sessions Hermes Agent Gets Smarter Every Day. So Does the Bill. How I get Next.js sites to load almost instantly — a practical checklist Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event Test a DNS Leak in 2 Minutes: Complete Methodology + Per-OS Fixes (2026) Lessons from building a Chrome extension Rivet: A library i made in 2 days I Built a Speech-to-Text Tool Because Sometimes Typing Just Gets in the Way How I'm Building a Multi-Agent Crew for AI Coding Supervision (Cipher Update) Your AI Agent Needs a Manager, Not a Superhero I Built CausalLens — A Free, Open-Source Causal Impact Calculator for Time Series (5 Methods, Zero Setup) How to write good commit messages and pull requests — a team guide Cipher: The Jarvis with a Hermes Core How to build a second brain with Obsidian and Claude Code (step by step) Claude completed my MPI assignment. Then it couldn't run it. So I built the missing piece. This 100% How Our Document Ingestion Pipeline Turns Files into LLM-Ready Markdown Agentic AI Model Risk Management: Aligning with Regulatory Expectations CTV Fraud Has an IPv6 Business Problem The great AI enshittification The Veltrix Treasure Hunt Engine: Why Our First Rewrite Cost Us 3.2 Million Requests Per Second I Made My AI Models Argue, Then Let Hermes Be the Judge Road To KiwiEngine #4: The Racecar Driver Analogy Run Aider on Ollama, Bedrock, or Any LLM Provider — One Gateway, Every Model BAIXAR VÍDEO DO YOUTUBE Releasing HeliosProxy, The programmable Postgres data-plane Hello, DEV Community! 👋 Three Bitcoin Primitives That Don't Exist Anywhere Else (PoW Beacon, DLC Oracle, Fair-Launch Rune)
Your AI coding agent doesn't need a smarter model. It needs your backlog.
Kunal Sharda · 2026-05-31 · via DEV Community

Here is the uncomfortable thing I have landed on after a year of watching coding agents succeed and fail on real work: the model is almost never the bottleneck. Claude Code and Codex are both more than capable of the feature you are asking for. What breaks the run is that the agent cannot see the truth it is supposed to build against. The story. The acceptance criteria. The architecture decision it is meant to respect. The test that already exists for the thing it is about to rewrite.

So it guesses. The guess is locally reasonable and globally wrong, and you spend the afternoon unwinding it. The instinct is to reach for a smarter model. The fix is to give the model your backlog.

Why pasting context stops working

Most of us feed an agent context by pasting it. You paste the ticket, a few file paths, maybe a paragraph of background, and you let it run. This works for a self-contained task and falls apart the moment the work touches the rest of the system.

The reason is simple. Pasted context is a snapshot, and snapshots go stale inside the same session. The agent makes a change on step three that invalidates the assumption you pasted on step one, but the pasted text does not update, so by step seven it is reasoning about a version of the project that no longer exists. You are not giving it context. You are giving it a photograph of context and asking it to navigate a moving room.

The second problem is that the things that actually matter for a real feature are relationships, not paragraphs. Which architecture decision constrains this story. Which test verifies this acceptance criterion. What defect we last saw in this area. None of that lives in a paragraph you can paste. It lives in the links between artifacts, and a paste flattens all of it into prose the agent has to re-infer.

To be clear, this is not an argument that models do not matter. A better model is genuinely better at reasoning once it has the right inputs. The claim is narrower and more useful: for the failures most teams actually hit on bigger tasks, fixing the inputs beats upgrading the model, and it is cheaper.

What the agent actually needs

It needs a source of truth it can query on demand, not a wall of text you pasted once.

When the agent can query, it pulls the current state at the moment it needs it. It asks "what are the acceptance criteria for this story" right before it writes the code, not at the start of a session that has since drifted. It asks "what tests already cover this module" before it rewrites the module, so it stops breaking things it did not know existed. It asks "which decision governs this boundary" before it crosses the boundary. The context is live because it is fetched, not remembered.

For that to work, two things have to be true. The truth has to exist in a structured, linked form, and the agent has to have a way to reach it. The first is a product problem. The second is a protocol problem, and the protocol now exists.

MCP is the part that just got easy

The Model Context Protocol is the reason this is suddenly practical rather than a research project. MCP is the standard way for an agent like Claude Code or Codex to call out to an external system and read or write structured data. Instead of you copying your backlog into a prompt, the agent connects to a server and queries the backlog directly, the same way it would call any other tool.

It is worth being precise about why this beats the usual "AI that knows your data" pitch, which almost always means vector search. Embedding your docs and retrieving the most similar passage is fine for "summarize this page" and useless for "which decision constrains this story," because similarity is not the same as relationship. A graph answers the relationship question by traversal: this story, to the decisions in its epic, to the ones touching the same boundary. The retrieval is structural, not statistical, and structure is exactly what a coding agent needs when the task spans more than one file.

A concrete before and after

Take a normal request: add rate limiting to an API endpoint.

In the paste workflow, you copy the ticket, mention the endpoint, and let the agent go. It writes a reasonable rate limiter. It does not know you already have a rate-limiting utility in the codebase because that was not in the paste, so now you have two. It does not know the architecture decision that says limits live at the gateway, not the handler, because that ADR is in a separate tool nobody linked. It writes a test, but not one that matches the acceptance criterion about per-tenant limits, because the AC was three tabs away. The code looks fine in review and is wrong in three quiet ways.

In the queryable workflow, the agent reads the story, sees the per-tenant acceptance criterion, queries the architecture decisions for the area and finds the gateway rule, checks existing tests and finds the utility, and writes against all of it. The pull request that comes back is not just plausible, it is consistent with how your system already works. You review intent, not archaeology.

The model was identical in both runs. The inputs were not.

A quick way to tell if context is your problem

Look at your last five agent failures and sort them. If the agent produced code that was wrong about how your system works, that is a context problem, and plumbing fixes it. If it produced code that was technically fine but solved the wrong thing, that is a clarity problem, and better acceptance criteria fix it. If it produced code that was just low quality on a simple task, that is the one case where a better model actually helps. In my experience the first bucket is the largest by a wide margin, and it is the cheapest to fix.

Where this does not help, and where simpler is right

If your tasks are genuinely small and self-contained, scripts, one-file changes, throwaway prototypes, none of this matters. Paste the context and move on. Wiring up a source of truth for work that fits in one screen is overkill.

If your context problem is actually a clarity problem, no amount of plumbing fixes it. Half of "the agent did the wrong thing" is really "nobody ever defined what done meant in checkable terms." If your acceptance criteria are vague prose, the agent will build vague prose.

And if you live entirely inside one tool that your agent already integrates with deeply, you may have enough of this already. The gap shows up when the truth the agent needs is spread across your tracker, your docs, your diagrams, and your test tool, none of which talk to each other.

The shift in how I think about agents now

I used to treat the agent as the thing to improve. Better prompts, better model, better tooling around the prompt. I now treat the agent as fixed and the context as the variable. Given a capable model, the quality of the output is mostly a function of what the agent can see at the moment it acts. Improve what it can see and the same model gets noticeably better, on the same task, on the same day.

That reframing is freeing, because context is something you control. You cannot make the model smarter this afternoon. You can absolutely give it your backlog this afternoon.

This is the thesis I ended up building Stride around: one connected graph of stories, tests, and architecture decisions, exposed to your coding agents over MCP so they read the real thing instead of a paste. But the idea stands on its own no matter what you use. Give your agent your backlog, not a photograph of it.

What are the rest of you doing to keep agents grounded once the task is bigger than a single file? I am collecting approaches and would genuinely like to hear them.