惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Months of self-testing: Citations shine, other features remain unproven. Claude Code for Canary Deployments: How I Ship to 1% of Users Before Breaking Everything Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET) Espressif Reveals CoreBoard and Korvo Dev Kits for ESP32-S31 Predicting Blood Glucose Fluctuations: Building a Transformer-based CGM Forecaster with PyTorch & InfluxDB Pre-task hooks: the one-line wire-up that gives your Hono agent shared memory Concurrent writes to a shared agent memory: what we shipped, what we punted on Building a Production Serverless URL Shortener on AWS — 21 Articles, Every Test Run for Real My CKA Cheat Sheet: Commands, Aliases, and Documentation Tricks I Used During the Exam Frontend Engineering Beyond Pixels: The Architecture of Digital Accessibility VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA into a Fancy Imitation Learner Fabric AI Functions Turn GenAI Into a Data Pipeline Step Proximate vs Ultimate: The Bug Is Never Just the Bug The Treasure Hunt Engine That Broke Before the Traffic Did Reset Windows Update: The Definitive MSP Guide to RWU Your Resume Was Never Built for This AI Writes 46% of Code Now: What Snap's Layoffs Mean for Developers in 2026 From Chatbot to Agent — Tool Calling with NVIDIA NIM Fatigue and Fracture Mechanics: Why Parts Break Below Their Yield Strength I built a token-level debugger for comparing two LLMs VCP-Virtual Private Cloud Embedding sing-box in an iOS messenger to bypass Russian DPI (no VPN) Microsoft Copilot just exfiltrated a company's files. The attack was one email. Here's the mechanism. RAG 시스템 실전 구축 (v42) copilot cloud agent is becoming an automation api Cx Dev Log — 2026-04-23 Why Tesla Is Becoming the AI Enterprise Case Study Every Leader Should Understand ORA-00214 오류 원인과 해결 방법 완벽 가이드 SpecAgnt v2.0: The Agent Lifecycle Framework for AI-Native Engineering Optimizing Signal Latency and Weight Allocations in Algorithmic Pipelines SSH Under the Hood: Protocols, Mechanisms, and the Full Technical Story دليل بوابات الدفع للتاجر العربي في 2026 (وكيف تختار المناسبة لمتجرك) Cómo Mi Configuración de Docker Me Salvó de un Ataque de Supply Chain (Y Por Qué la Tuya Debería Hacerlo También) How My Docker Setup Saved Me From a Supply Chain Attack (And Why Yours Should Too) Astro: The epitome of SEO Technical Update I Gave My AI Agent the Ability to Research Before It Writes — Here’s What Changed Kubernetes sem Cloud Provider (Parte 2): Criando Operators em Go para automação e self-service de plataforma AI Memory Needs an Authority Policy, Not Just More Context You've done tutorial after tutorial. Your GitHub is still empty. (Free 1‑page PDF, no signup) TypeScript 7.0: The Go Compiler That Makes TS 10x Faster Connecting Wallets the Right Way: wagmi v2 and EIP-6963 The 5-Layer Architecture Every Production Multi-Agent System Needs (And Why Most Skip Layers 4 and 5) CSS Scroll-Driven Animations: No JavaScript Required Vite 8 + Rolldown: Rust-Powered Builds That Are 10–30x Faster Core Architectural Components of Azure My Skills How I Use AI as a Senior Engineer Construí um motor ATS determinístico porque estava cansado de adivinhar por que meu currículo era rejeitado SCS-Lab1 — CloudTrail: Trail + S3 + KMS + Log Validation LuisCore MCP server — daily syndication · 2026-05-25 Cursor vs JetBrains Rider for C#/.NET in 2026: which to pay for I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama) Scaling to 1 Million Users : Load Balancing & Caching Strategies How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl Designing TikTok from Scratch — A System Design Deep Dive PREDICTION-20260525-0007: boredom-with-asymmetric-leverage [2026-Q3 through 2027-Q3] [Boost] How to integrate the QuickBooks Invoice API in 2026 How I Cut My Anthropic API Bill by 50% With a Local Python Tool Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship Chinese AI Models 2026: The Agentic Revolution, Hardware Independence, and What It Means for Global Developers The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine RAG 시스템 실전 구축 (v40) I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users
How to Fix Tool-Use Loops in Autonomous Coding Agents
Alan West · 2026-05-26 · via DEV Community

Last month I was helping a friend debug their autonomous coding agent. It had been "working" on a task for 47 minutes, burned through roughly twelve bucks in API costs, and somehow ended up exactly where it started. The logs showed it had called read_file on the same five files 23 times.

If you've built or experimented with AI coding agents, you've probably seen something like this. It's not a fun bug to debug — the agent isn't crashing, it isn't erroring, it just... never finishes.

The Problem: Why Agents Loop Forever

Tool-use loops are the most expensive failure mode in agent design. From the outside, the agent looks busy. It's reading files, calling tools, generating thoughts, producing output. But it's not making progress toward the goal.

The shape is almost always the same:

  • Agent reads file A
  • Agent realizes it needs context from file B
  • Reads file B, gets confused by something unexpected
  • Goes back to file A "to double-check"
  • Reads file B again because file A didn't have what it needed
  • Repeat until your wallet cries

I've now seen this in three different agent setups across two side projects and one client engagement. The symptoms are identical every time.

Root Cause: Stateless Decision-Making

The fundamental issue is that the agent's working state looks nearly identical at step N and step N+5. Same task description in the system prompt, same files implicitly available, same general feel of the conversation. So the model — given essentially the same inputs — makes essentially the same decision.

There are three concrete causes worth separating:

  1. No explicit action history. The agent has called read_file("config.yaml") four times, but each turn the model mostly "sees" the latest tool result, not the pattern of what it's already tried.
  2. No reflection step. Nothing in the loop ever asks "am I actually making progress?"
  3. Errors get summarized away. A tool failure gets compressed into a vague "the previous call had an issue" and the model retries with the same broken inputs.

Let's walk through fixing each one.

Step 1: Track Tool Calls Explicitly

Don't rely on the conversation history to encode what's been tried. Build a structured log the model can actually reason about.

from collections import Counter
from dataclasses import dataclass, field
from typing import Any

@dataclass
class ToolCallLog:
    # Counts repeated (tool_name, args) pairs so we can detect loops
    calls: Counter = field(default_factory=Counter)
    history: list = field(default_factory=list)

    def record(self, name: str, args: dict[str, Any], result: str):
        key = (name, _hash_args(args))  # stable hash of args
        self.calls[key] += 1
        self.history.append({"name": name, "args": args, "result_preview": result[:200]})

    def summary_for_model(self) -> str:
        # Surface repeated calls so the model SEES the loop forming
        repeated = [(k, n) for k, n in self.calls.items() if n > 1]
        if not repeated:
            return "No repeated tool calls so far."
        lines = [f"- {name}{args} called {n}x" for (name, args), n in repeated]
        return "Repeated calls detected:\n" + "\n".join(lines)

Enter fullscreen mode Exit fullscreen mode

Then inject log.summary_for_model() into the system prompt every turn. Suddenly the model can see that it's about to call read_file("config.yaml") for the fifth time, and most modern models will course-correct on their own.

Step 2: Add a Loop Detector

Don't trust the model to always notice. Add a circuit breaker:

MAX_IDENTICAL_CALLS = 3
MAX_TOTAL_STEPS = 40

def should_force_reflection(log: ToolCallLog) -> str | None:
    # Return a reflection prompt if we detect a loop, else None
    for key, count in log.calls.items():
        if count >= MAX_IDENTICAL_CALLS:
            name, args = key
            return (
                f"You've called {name} with the same args {count} times. "
                "This is a loop. Stop and explain in one sentence what you "
                "actually need, then choose a different strategy."
            )
    if len(log.history) >= MAX_TOTAL_STEPS:
        return (
            "You've taken many steps without finishing. Summarize what you "
            "know, what you still need, and propose a single next action."
        )
    return None

Enter fullscreen mode Exit fullscreen mode

When this triggers, inject the returned string as a user message before the next model call. I've found this single change cuts wasted tokens by something like half on the workflows I've tested. Your mileage will vary, but the direction is consistent.

Step 3: Force Reflection on a Schedule

Even without a detected loop, models drift on long tasks. A periodic forced reflection helps. The cadence I've landed on is every 8–10 tool calls:

REFLECTION_INTERVAL = 8

def maybe_reflect(step: int, task: str) -> str | None:
    if step > 0 and step % REFLECTION_INTERVAL == 0:
        return (
            f"Pause. Original task: {task}\n"
            "In 3 short bullets, answer:\n"
            "1. What have I actually accomplished?\n"
            "2. What is still blocking completion?\n"
            "3. Is my current approach working, or should I change it?"
        )
    return None

Enter fullscreen mode Exit fullscreen mode

This is borrowed from human pair programming — "hey, where are we?" every so often is healthy.

Step 4: Make Errors Loud

The last fix is the most boring but probably the most important. When a tool fails, don't soften the error message:

def format_tool_error(name: str, args: dict, exc: Exception) -> str:
    # Be specific about what failed. Generic errors invite retries.
    return (
        f"TOOL ERROR: {name} failed with {type(exc).__name__}: {exc}.\n"
        f"Inputs were: {args}.\n"
        "Do NOT retry with identical arguments. Either fix the inputs "
        "or choose a different tool."
    )

Enter fullscreen mode Exit fullscreen mode

The "Do NOT retry with identical arguments" line sounds silly but actually moves the needle. I tested with and without it on the same task three times — without it, the agent retried failing calls about 60% of the time. With it, closer to 10%. Tiny sample size, but the effect was obvious.

Prevention: Design Choices That Help

A few patterns I now reach for by default when building agents:

  • Cap context per tool. Truncate read_file results to the relevant section instead of dumping whole files. Less noise, more signal.
  • Use scratchpad files. Give the agent a notes.md it can write to. Externalized memory is cheaper than re-deriving state from chat history.
  • Separate planning from execution. A small "planner" call that emits a 5-step plan, followed by an executor that follows it, loops far less than a single agent doing both.
  • Log everything during development. You cannot debug what you cannot see. Persist full tool histories to disk for the first few weeks of any new agent.

None of this is novel — the broader agent research community has been writing about reflection, planning, and memory for a while. But it's easy to skip these when you're hacking together a prototype and assume "the model will figure it out." It won't. Not reliably.

Wrapping Up

Tool-use loops are not a model problem so much as a harness problem. The model is doing exactly what you'd expect given identical inputs every turn. Your job, as the person building the loop around the model, is to make sure the inputs aren't identical — that the agent can see its own history, get nudged when it's stuck, and feel the weight of its errors.

Fix those four things and most of your runaway agent costs go away. The rest is just tuning.