惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
V
Vulnerabilities – Threatpost
有赞技术团队
有赞技术团队
小众软件
小众软件
O
OpenAI News
C
Cyber Attacks, Cyber Crime and Cyber Security
I
Intezer
NISL@THU
NISL@THU
D
Darknet – Hacking Tools, Hacker News & Cyber Security
N
News and Events Feed by Topic
MongoDB | Blog
MongoDB | Blog
阮一峰的网络日志
阮一峰的网络日志
Hacker News: Ask HN
Hacker News: Ask HN
D
Docker
WordPress大学
WordPress大学
Security Archives - TechRepublic
Security Archives - TechRepublic
A
About on SuperTechFans
Stack Overflow Blog
Stack Overflow Blog
C
CERT Recently Published Vulnerability Notes
L
LINUX DO - 最新话题
Application and Cybersecurity Blog
Application and Cybersecurity Blog
M
MIT News - Artificial intelligence
Blog — PlanetScale
Blog — PlanetScale
S
Security @ Cisco Blogs
Cloudbric
Cloudbric
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
Hacker News - Newest:
Hacker News - Newest: "LLM"
G
Google Developers Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
Google DeepMind News
Google DeepMind News
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
H
Hackread – Cybersecurity News, Data Breaches, AI and More
G
GRAHAM CLULEY
S
Schneier on Security
T
Tor Project blog
Spread Privacy
Spread Privacy
PCI Perspectives
PCI Perspectives
Microsoft Security Blog
Microsoft Security Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
F
Fortinet All Blogs
L
Lohrmann on Cybersecurity
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
T
The Exploit Database - CXSecurity.com
TaoSecurity Blog
TaoSecurity Blog
Apple Machine Learning Research
Apple Machine Learning Research
T
Threat Research - Cisco Blogs
T
Troy Hunt's Blog
罗磊的独立博客

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Coding Agents Don't Understand Your Codebase. Here's What They Actually Do.
Stav Kamil · 2026-04-26 · via DEV Community

Coding agents don't understand your codebase.

Yeah, sounds weird I know.

They edit your files, run your tests, fix bugs across multiple modules. It feels like they get it. But they don't.

I spent a while looking into how these tools actually work under the hood. There's no deep understanding happening.

I mean, there IS reasoning - but not as deep as you may think. They can't look at your repo and form a mental model the way a human engineer does. They don't "know" your architecture. They don't reason about your system. They process text, make a guess, and check what happens.

That's not me being negative. I actually think knowing this is useful. Once you understand how these agents work internally, even at a basic level, you start using them differently.

That's happend to me acutally. I stopped expecting them to "just figure it out" and start giving them what they actually need. I started to understand why they get stuck, why they repeat themselves, why they sometimes nail it on the first try and sometimes spiral. It made me better at using what they're actually good at.

So what's going on inside? Agents are LLMs using tools within a feedback loop.

That loop (the same one in every coding agent I've looked at) is actually pretty straightforward and easy to understand, and it's enough to explain how these systems behave.

Of course, the engineering around that loop is anything but simple. Error recovery, context management, streaming, and safety layers—those are all sophisticated, and I'm not downplaying them. I'm just trying to show that there's no magic at the center.

A chatbot with a while loop

The building block of every coding agent is the LLM (Large Language Model). An LLM is a text-in, text-out function. You send it a system prompt, a conversation history, and a message. It uses all of that to generate a response. That's it. One API call, one response.

The important thing: this LLM works from what's in its context window. It has general knowledge from training, but it knows nothing about your code until you feed it in. It doesn't remember previous conversations unless they're included in the input. It doesn't have access to anything outside of what you send it.

So a coding agent is what happens when you put that API call in a loop and give the model tools.

Instead of one call and done, the model can say "I need to read a file" or "run this test," the agent executes that, adds the result to the context, and calls the model again. Now it knows something it didn't know a second ago.

That's really the only difference between an LLM and an agent. One call vs. a loop. The model decides what to do next, the runtime decides what's allowed.

The five-step cycle

Every coding agent I've looked at implements the same five steps, regardless of language or framework:

  1. Prepare context — assemble the system prompt, conversation history, and available tools into a single request.
  2. Call the LLM — send that request to the model and stream the response back.
  3. Parse the response — figure out what the model returned. Plain text? Tool calls? Both?
  4. Execute tool calls — run whatever the model asked for: read a file, run a command, edit code.
  5. Check if we're done — if the model called tools, append the results to the history and go back to step 1. If not, return the response to the user.

Wait, what is tool calling?

When you call an LLM API, you can pass a list of tools, basically function signatures that describe what each tool does and what arguments it takes. The model doesn't execute anything itself. It just returns a structured response saying "I want to call this function with these arguments."

The agent code is responsible for actually running that function, collecting the result, and sending it back to the model in the next call. The model sees the result and decides what to do next: call another tool, or respond to the user.

So when we say "the model reads a file," what really happens is: the model outputs a tool call like read_file({ path: "src/index.ts" }), the agent runs that, and sends the file contents back as a message. The model never touches your file system. It just asks, and the loop delivers.

In TypeScript, the core of it looks something like this:

async function agentLoop(userMessage: string) {
  const messages = [{ role: "user", content: userMessage }];

  while (true) {
    // 1. Call the LLM with the full conversation
    const response = await llm.chat({
      model: "your-model",
      system: systemPrompt,
      messages,
      tools: availableTools,
    });

    // 2. Add the assistant's response to history
    messages.push({ role: "assistant", content: response });

    // 3. Check for tool calls
    const toolCalls = response.toolCalls;
    if (!toolCalls?.length) {
      break; // No tools called — the model is done
    }

    // 4. Execute each tool and collect results
    for (const call of toolCalls) {
      const result = await executeTool(call.name, call.arguments);
      messages.push({
        role: "tool",
        toolCallId: call.id,
        content: result,
      });
    }

    // 5. Loop back — the model will see the tool results
    //    and decide what to do next
  }
}

Enter fullscreen mode Exit fullscreen mode

That's roughly 25 lines. Everything else a coding agent does (streaming, permissions, error recovery, context management) is built on top of this skeleton.

The messages array is the state. It grows with every iteration. The LLM only knows what's in its context, and this array is the context. Every tool result, every previous response, gets appended here. That's how the model knows what it tried, what worked, and what to do next.

Where the complexity hides

You've seen the whole loop in 25 lines. But the system built around it is serious engineering, and that's what turns a basic while loop into something that can actually ship code.

Tool execution: parallel or sequential?

When the LLM asks to read three files at once, do you run them in parallel or one at a time?

Reading files in parallel is faster. But what about writing? If the model asks to edit two files that depend on each other, running those writes simultaneously could cause race conditions.

Most production agents land on a hybrid: read-only tools run in parallel, mutations serialize. Some add a per-file queue, so two edits to different files run in parallel, but two edits to the same file wait in line.

Errors are results, not exceptions

One design decision matters more than most here: when a tool fails, you don't throw. You return the error as a tool result.

async function executeTool(name: string, args: unknown) {
  try {
    return await tools[name].execute(args);
  } catch (err) {
    // Don't throw - return it as a result
    return `Error: ${err.message}`;
  }
}

Enter fullscreen mode Exit fullscreen mode

Why? Because the LLM can read errors. If you return "Error: file not found: src/utils.ts", the model will often correct itself. It'll check the right path, try a different approach, or ask the user for help. If you throw, the turn crashes and the agent loses all context about what it was doing.

That's where the self-correcting behavior comes from. The model gets a chance to recover on every failure.

Streaming: showing work in real time

Nobody wants to stare at a blank screen for 30 seconds. Streaming fixes that.

The LLM response arrives token by token as Server-Sent Events. The agent pipes these to the UI as they come in, so you see the model "thinking" in real time. Some agents stream tool execution too: you see "Reading src/index.ts..." the moment the tool starts, not after it finishes.

The most aggressive implementations start executing tools during the LLM stream. As soon as a tool call block is complete, before the full response has even finished. That shaves seconds off each loop iteration.

Context compaction: when the conversation gets too long

Here's a problem that's unique to loops: each iteration adds to the message history. After enough turns of reading files, running tests, editing code, the conversation can blow past the model's context window.

But it's not just about hitting a size limit. As the context grows, the model's performance degrades. Older messages get "buried" under newer ones, the model starts losing track of what it already did, and its reasoning gets worse. This is sometimes called context rot — the context is technically there, but the model can't make good use of it anymore.

This is a whole concept in itself (context engineering) and bigger than this post. But the basic mechanism most agents use is compaction: when the history gets too long, summarize the older messages and replace them with the summary. The tricky part is never splitting a tool call from its result. The model needs to see these as pairs, or it gets confused about what happened.

Turn 1:  user message         ─┐
Turn 2:  read_file → result    │  Summarized → "Previously: read config,
Turn 3:  edit_file → result    │               edited database settings,
Turn 4:  run_tests → result   ─┘               tests passed"
Turn 5:  user follow-up       ← kept as-is
Turn 6:  read_file → result   ← kept as-is

Enter fullscreen mode Exit fullscreen mode

Some agents compact proactively (before hitting the limit), others reactively (after getting a "context too long" error from the API). The best do both.

Doom loop detection

Sometimes the model gets stuck. It calls the same tool with the same arguments, gets the same error, and tries again. And again.

Agents detect this by hashing recent tool call signatures and checking for repetition. If the same pattern shows up three or more times, the agent injects a warning: "You appear to be repeating the same action. Try a different approach."

If the model still can't break out, circuit breakers kick in. Hard limits on turns per request, errors per tool, or total API calls.

Permissions: the human in the loop

Reading a file is safe. Running rm -rf / is not. So agents classify tools by risk:

  • Allow — execute immediately (read files, search code)
  • Ask — pause and get user approval (shell commands, file writes)
  • Deny — never execute (destructive operations)

When a tool requires approval, the loop pauses. It sends a permission request to the UI, waits for the user's response, then proceeds. The loop doesn't break. It just waits. This is another large concept on its own, but I've tried to simplify it here.

What I'm taking from this

The whole reason I went down this rabbit hole was to become more productive with these tools. Understanding the internals, even at this level, changed how I use them day to day.

I write better prompts now. Knowing the model only sees what's in its context window changed how I talk to agents. I front-load the important stuff: file paths, expected behavior, constraints. I stopped assuming the agent "knows" things about my project that I didn't explicitly tell it.

I structure my repos to be agent-friendly too. Clear file names, good READMEs, co-located tests. If the agent is going to read my codebase one file at a time through tool calls, I want each file to make sense on its own. The easier it is for a human to navigate, the easier it is for an agent.

I understand why agents fail now. When one spirals or repeats itself, I don't just retry and hope. I know it's probably a context problem, either the history got too long and the model lost track, or the information it needs was never there to begin with. That makes debugging way faster.

And I don't trust them blindly. They're not reasoning about my system. They're generating the most likely next token given what they've seen. That's powerful, but it means they can be confidently wrong. I review everything, especially code I don't fully understand yet.

Wrapping up

Coding agents aren't magic. They're a while loop, an LLM call, and a set of tools. The impressive part is that this simple pattern, with good engineering around it, actually works.

Understanding that changed how I work with them. I give better context, I know when to trust them and when not to, and when they fail I have a mental model for why.