惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

爱范儿
爱范儿
Know Your Adversary
Know Your Adversary
Google DeepMind News
Google DeepMind News
A
Arctic Wolf
P
Privacy & Cybersecurity Law Blog
云风的 BLOG
云风的 BLOG
Stack Overflow Blog
Stack Overflow Blog
V
Visual Studio Blog
Project Zero
Project Zero
L
LangChain Blog
N
News and Events Feed by Topic
博客园 - Franky
Last Week in AI
Last Week in AI
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Scott Helme
Scott Helme
T
The Exploit Database - CXSecurity.com
P
Proofpoint News Feed
Blog — PlanetScale
Blog — PlanetScale
www.infosecurity-magazine.com
www.infosecurity-magazine.com
W
WeLiveSecurity
月光博客
月光博客
博客园_首页
美团技术团队
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
腾讯CDC
Latest news
Latest news
WordPress大学
WordPress大学
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Spread Privacy
Spread Privacy
Attack and Defense Labs
Attack and Defense Labs
量子位
L
LINUX DO - 热门话题
C
CERT Recently Published Vulnerability Notes
Webroot Blog
Webroot Blog
L
Lohrmann on Cybersecurity
aimingoo的专栏
aimingoo的专栏
T
Troy Hunt's Blog
Security Latest
Security Latest
小众软件
小众软件
Cloudbric
Cloudbric
Hacker News: Ask HN
Hacker News: Ask HN
S
Secure Thoughts
雷峰网
雷峰网
T
Threat Research - Cisco Blogs
H
Hacker News: Front Page
IT之家
IT之家
Simon Willison's Weblog
Simon Willison's Weblog

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Your AI coding agent is a while loop with delusions of grandeur
Fernando Rod · 2026-04-30 · via DEV Community

The first time I used Claude Code to refactor an entire module, I had an almost mystical experience. I described what I wanted, went to grab a coffee, and when I came back there was a pull request with 14 changed files, updated tests, and a decent commit message. "This is magic," I thought.

It's not magic. It's a while loop.

Michael Bolin from OpenAI recently published an article breaking down the internal workings of Codex CLI. And it turns out the secret behind AI coding agents isn't some revolutionary algorithm or mysterious neural network. It's a loop that calls an LLM, executes tools, and repeats until there's nothing left to do.

Let's tear it open.

The state machine: 5 phases and a loop

Every coding agent — Codex, Claude Code, Cursor, whatever — executes the same fundamental pattern. Michael Bolin describes it as a 5-phase loop:

flowchart TD
    A["1. Prompt Assembly\n(build the prompt)"] --> B["2. Inference\n(send to LLM)"]
    B --> C{Tool call?}
    C -->|Yes| D["3. Tool Invocation\n(execute tool)"]
    D --> E["4. Tool Response\n(return result to LLM)"]
    E --> B
    C -->|No| F["5. Assistant Message\n(final response)"]
    F -->|New input| A

    style A fill:#2d3748,stroke:#4a9eed,color:#fff
    style B fill:#2d3748,stroke:#4a9eed,color:#fff
    style C fill:#4a3728,stroke:#ed9a4a,color:#fff
    style D fill:#2d3748,stroke:#4a9eed,color:#fff
    style E fill:#2d3748,stroke:#4a9eed,color:#fff
    style F fill:#283d28,stroke:#4aed5c,color:#fff

Enter fullscreen mode Exit fullscreen mode

In plain terms:

  1. Prompt Assembly: a massive prompt is built containing everything the agent needs to know — your message, system instructions, available tools, files it has read, and the complete conversation history.
  2. Inference: that prompt is tokenized and sent to the model. The model returns a stream of events: internal reasoning, tool calls, or response text.
  3. Tool Invocation: if the model asks to execute a tool (read a file, run a command, write code), it gets executed. If it fails, the error goes back to the model.
  4. Tool Response Loop: the tool's result returns to the model as additional context. Steps 2-4 repeat until the model stops requesting tools.
  5. Assistant Message: when the model decides it's done, it emits a final message and the cycle closes.

That's it. No knowledge graphs, no symbolic planners, no sophisticated architectures. It's a while loop with an LLM inside.

The difference between a good agent and a bad one isn't in the loop architecture — which is identical — but in the details of each phase.

Phase 1: The art of prompt assembly

The first phase is where everything gets cooked. Before the LLM sees a single line of your code, the agent has to build a prompt that includes:

flowchart LR
    subgraph Prompt["Prompt Assembly"]
        direction TB
        SP["System Prompt\n(personality, rules)"]
        Tools["Available tools\n(Read, Write, Bash, MCP...)"]
        Ctx["Files / images\nread previously"]
        Inst["CLAUDE.md / AGENTS.md\n(repo instructions)"]
        Env["Environment info\n(OS, shell, git status)"]
        Hist["Conversation\nhistory"]
        User["User message"]
    end

    SP --> Final["Complete\nprompt"]
    Tools --> Final
    Ctx --> Final
    Inst --> Final
    Env --> Final
    Hist --> Final
    User --> Final

    style Final fill:#283d28,stroke:#4aed5c,color:#fff

Enter fullscreen mode Exit fullscreen mode

Already you can see a critical design decision: order matters. The prompt is built from most stable to least stable. The system prompt goes first (never changes), then tools (rarely change), then files and history (grow with each interaction), and finally your latest message.

Why this order? Prompt caching. Since caching works by exact prefix matching, putting stable content first maximizes the number of tokens read from cache on each iteration. Changing something early invalidates everything that follows. I covered this in detail in my article about prompt caching, but the key idea is: your prompt order isn't cosmetic, it's economic.

Then there are the CLAUDE.md and AGENTS.md files. Both are like leaving a note for the plumber before you leave the house: "the shutoff valve is under the sink, don't touch the blue pipe." The agent reads them on startup and injects them into every prompt. They're your mechanism for providing context without having to repeat yourself every time.

The quadratic problem: why context grows like a snowball

Here comes the reality check. Each loop iteration sends the entire complete conversation to the model. There's no server-side state. Each request is independent, stateless.

Why? Because this way the provider can guarantee Zero Data Retention — your data doesn't persist on their servers between requests. It's a privacy decision, not an efficiency one.

But it has a brutal cost:

flowchart LR
    subgraph Msg1["Iteration 1"]
        S1["System\n10K tok"] --> U1["User\n500 tok"]
    end

    subgraph Msg5["Iteration 5"]
        S5["System\n10K tok"] --> H5["History\n40K tok"] --> U5["User\n500 tok"]
    end

    subgraph Msg20["Iteration 20"]
        S20["System\n10K tok"] --> H20["History\n180K tok"] --> U20["User\n500 tok"]
    end

    style Msg1 fill:#1a2332,stroke:#4a9eed,color:#fff
    style Msg5 fill:#2a2332,stroke:#9a4eed,color:#fff
    style Msg20 fill:#3a1a1a,stroke:#ed4a4a,color:#fff

Enter fullscreen mode Exit fullscreen mode

On iteration 1 you send 10K tokens. On iteration 5, you send 50K. On iteration 20, you send 190K. Each message resends the entire previous history. And since the transformer's self-attention mechanism has quadratic cost relative to the number of tokens, it's not just the amount of data sent that grows — the computational cost of processing it grows too.

Put differently: iteration 20 doesn't cost 20 times more than the first. It costs much more.

Compaction: compression without losing what matters

Both Codex and Claude Code have a solution for runaway context growth: compaction (or automatic compression).

When the history approaches the context window limit, the agent does something clever: it sends the entire history to a special endpoint that generates a compressed representation. Instead of 180K tokens of conversation, you get maybe 20K that capture the decisions made, files modified, and current task state.

flowchart TD
    Full["Complete history\n180K tokens"] --> Check{Near limit?}
    Check -->|No| Continue["Continue normally"]
    Check -->|Yes| Compact["Compaction endpoint"]
    Compact --> Summary["Compressed summary\n~20K tokens"]
    Summary --> NewCtx["New context\n= System + Summary + Last message"]
    NewCtx --> Continue2["Continue with fresh context"]

    style Full fill:#3a1a1a,stroke:#ed4a4a,color:#fff
    style Summary fill:#283d28,stroke:#4aed5c,color:#fff
    style Compact fill:#2d3748,stroke:#4a9eed,color:#fff

Enter fullscreen mode Exit fullscreen mode

Compression isn't free. You lose detail. The model no longer has access to the exact diff you made in step 7, but rather a summary saying "refactored authentication module." For most tasks this is sufficient. For surgical debugging, it can be a problem.

Codex calls it compaction. Claude Code does something equivalent with automatic context compression. The idea is identical: when context gets out of hand, compress the past and move forward with a lighter version.

Sandbox: the golden cage

Both agents execute tools in a sandbox — a restricted environment where network access and filesystem access are limited by default.

This is fundamental. Without a sandbox, an rm -rf / generated by model hallucination would destroy your machine. With a sandbox, the worst case scenario is breaking something within the permitted boundaries.

Claude Code asks for confirmation for each potentially destructive operation (unless you explicitly approve it). Codex CLI operates by default in a similar explicit permissions mode.

The lesson here isn't technical, it's philosophical: an agent that can do anything is an agent you can't trust. Restrictions aren't limitations — they're guarantees.

Codex CLI vs Claude Code: non-identical twins

Now comes the fun part. Both are the same loop inside, but design decisions diverge at interesting points:

flowchart TB
    subgraph Codex["Codex CLI (OpenAI)"]
        direction TB
        CG["Desktop GUI\n(Command Center)"]
        CS["Generic shell\n(bash/terminal)"]
        CA["Automations\n(native scheduling)"]
        CD["Diffs with\ninline comments"]
    end

    subgraph Claude["Claude Code (Anthropic)"]
        direction TB
        CC["CLI-first\n(native terminal)"]
        CT["Dedicated tools\n(Read, Edit, Grep, Glob)"]
        CK["Skills\n(/blog, /improve...)"]
        CF["Conversational\nfeedback"]
    end

    style Codex fill:#1a2332,stroke:#4a9eed,color:#fff
    style Claude fill:#2a1a32,stroke:#9a4eed,color:#fff

Enter fullscreen mode Exit fullscreen mode

Tools: generic vs specialized

Codex gives the model access to a generic shell. If you want to read a file, the model executes cat file.py. If you want to search text, it runs grep -r "pattern" ..

Claude Code does the opposite: it has dedicated tools for each operation. Read for reading files, Edit for editing them (with exact string replacement, not complete rewriting), Grep for searching, Glob for finding files by pattern.

Which is better? Depends how you look at it. The generic shell is more flexible — anything you can do in a terminal, the model can do. But dedicated tools are safer and more efficient. An Edit that only sends the diff of the change is faster and less error-prone than a cat > file.py << 'EOF' that rewrites the entire file.

My experience: dedicated tools win for 90% of cases. The generic shell wins when you need to do something exotic that no tool covers.

GUI vs CLI

Codex bets on a desktop GUI (Command Center) where you see diffs like in a pull request, can add inline comments to changes, and have a graphical view of what the agent is doing.

Claude Code is pure CLI. Your terminal. Your shell. No windows. If you want to review a change, the agent shows it to you in text. If you want to give feedback, you write it as another message in the conversation.

What do I prefer? The CLI, by far. And not out of hacker purism. It's that a CLI integrates with everything: tmux, scripts, cron, CI pipelines, remote control via SSH. A GUI ties you to a specific screen. For interactive sessions the GUI is more visual, yes. But for real work — long tasks, automation, agents running solo — the CLI has no competition.

Scheduling: native vs DIY

Codex has Automations: you can schedule tasks that run automatically (react to a GitHub event, launch an agent every morning, etc.). It's native scheduling within the platform.

Claude Code has none of that. If you want an agent to run every 30 minutes, you set up a cron job or systemd timer. If you want it to react to a webhook, you build the integration yourself.

Here Codex has an objective advantage for teams wanting automation out of the box. But Claude Code's DIY solution has a non-obvious advantage: you control the infrastructure. If Anthropic changes their API, your cron job keeps working because it's your machine. If OpenAI changes Automations, you're stuck.

What really matters

After dissecting the guts of both agents, the conclusion is almost disappointingly simple:

A coding agent is a loop that builds a prompt, calls an LLM, executes tools, and repeats. Period.

The magic isn't in the loop. It's in three things:

  1. Model quality. A while loop with GPT-3 does nothing useful. With Claude Opus or GPT-4o, it refactors entire modules. The loop is the same — the brain inside the loop is what makes the difference.

  2. Context management. The prompt can't grow infinitely. How you order information, when you compress, what you prioritize when compressing — that's where real engineering matters. An agent that loses critical context during compression makes mistakes a human never would.

  3. Tool design. Giving an LLM unrestricted access to bash is like giving car keys to someone who's never driven. Well-designed tools (with validation, constraints and clear error feedback) are the difference between an agent that helps you and one that goes off the rails and deletes node_modules at three in the morning.

Next time your coding agent does something that seems like magic, remember: it's a while True with an LLM inside. Elegant, yes. Powerful, absolutely. But magic? Not quite.


Sources: The main article is "What Actually Happens Inside an AI Coding Agent (We Unrolled It)" by Michael Bolin (OpenAI). The Claude Code comparison comes from direct experience and Anthropic's official documentation. If you're interested in context and caching, read Por qué el 99% de lo que envías a Claude ya lo tiene en caché and El cache de tu LLM te cobra el doble por ahorrarte dinero.

This article was originally written in Spanish and translated with the help of AI.