惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

宝玉的分享
宝玉的分享
WordPress大学
WordPress大学
博客园 - 司徒正美
美团技术团队
酷 壳 – CoolShell
酷 壳 – CoolShell
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
小众软件
小众软件
量子位
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
有赞技术团队
有赞技术团队
博客园 - 【当耐特】
博客园 - Franky
Jina AI
Jina AI
人人都是产品经理
人人都是产品经理
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Threat Research - Cisco Blogs
D
Darknet – Hacking Tools, Hacker News & Cyber Security
F
Fox-IT International blog
T
ThreatConnect
A
Arctic Wolf
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
C
CERT Recently Published Vulnerability Notes
P
Palo Alto Networks Blog
李成银的技术随笔
Project Zero
Project Zero
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
F
Full Disclosure
H
Hacker News: Front Page
雷峰网
雷峰网
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
S
SegmentFault 最新的问题
S
Schneier on Security
T
Tor Project blog
博客园_首页
月光博客
月光博客
大猫的无限游戏
大猫的无限游戏
博客园 - 聂微东
S
Securelist
C
Comments on: Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Attack and Defense Labs
Attack and Defense Labs
IT之家
IT之家
博客园 - 叶小钗
J
Java Code Geeks
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events

DEV Community

Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks Pressure-testing Ota on Supabase: from setup prose to executable repo readiness VPC CNI en EKS: cómo dejar de pagar nodos que no usás The Future of Text Analysis: Introducing TechnoHelps Semantic Engine I built a Chrome Extension that saves product images + context directly to Google Drive & Sheets 95+ browser-based dev tools that never touch a server Running Qwen 2.5 Coder 14B Locally in Cursor with Ollama From a 10,000-line OpenSearch export script to a log analysis tool Ghost Bugs Cost $40K: A Neural Debugging Postmortem SECPAC: A Lightweight CLI Tool to Password-Protect Your Environment Variables 🚀 PasteCheck v1.7 + v1.8 — Hints that tell you what to fix, and a nudge panel that tells you where to start 8 Real Ways Developers Make Money in 2026 (Ranked by Effort) I built a free AI-powered Git CLI that writes your commit messages for you sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs OpenLiDARViewer: A Browser-Based LiDAR and Point-Cloud Viewer Local-First Browser Tools: What You Should Not Upload Online Why most freelancers undercharge (and the maths behind fixing it) We built a mahjong dangerous-tile predictor calibrated on 4.97M real hands Building a Chord Progression Generator in the Browser — Music Theory in JS, Sound via Web Audio API tutorial #10: 148 Opens, 0 Replies — How My Forge Cold Email v1 Completely Failed 9 in 10 Docker Compose files skip the basic security flags How to Forward Android SMS to Telegram Automatically I built the first security scanner for MCP servers — here's what I found Building an Interplanetary Quantum Logic Engine in Rust/Ovie From AI Code Generation to AI System Investigation I gave Gemini 3.5 Flash a CVE-fix PR to review. It found another bug in the same file. When I Realized We Were Throwing Away Half Our Engine's Potential TokenJuice and the 20-Minute Cron: Inside OpenHuman’s Aggressive Context-Harvesting Engine CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode Building a semantic search API in Go with Meilisearch April 2026 DigitalOcean Tutorials: Inference Optimization and AI Infrastructure Looking for DTMF transceiver module Moving Beyond "Tribal Software": Why the Singularity Demands the Interplanetary Hybrid Human Use SVGIcons as a Claude Custom Connector to Find Icons Faster DMARC Is Now a Proper Internet Standard: What Changed in RFC 9989/9990/9991 OpenTelemetry Is Now a CNCF Graduate — and It's Coming for Your AI Stack OpenHuman Follows OpenClaw’s Rise, But With an Obsidian Brain O erro mais caro em programas Solana: PDA sem bump check Build a Live Flight Radar in a Single HTML File DuckDB 1.5.3 Adds Quack Client-Server, SQLite Gets Cypher Graph Extension Custom Copilot Agents: Building Domain-Expert AI Teammates with Skills, MCP Tools, and Custom Knowledge RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains This week in Cursor + .NET — 3 rules + 4 essays (week ending May 22, 2026) RAG Architecture with n8n + PostgreSQL (pgvector) + Ollama Gemma4 on AWS EC2 Keep Your Taste I Built chanprobe Because My Go Queues Were Invisible Building a Live Solana TPS Meter with OrbitFlare's TypeScript SDK Using Gemma 4 to Analyze Bitcoin’s Next 5, 15, and 60 Minutes Security news weekly round-up - 22nd May 2026 When Stress Disguises Itself as Rational Planning (Bite-size Article) A Domain-Driven Notification Microservice — Patterns From Production I Built KubeCrash: Learn Kubernetes by Diagnosing Real Incidents The Real-World Test: How Gemini’s New Interface Won Over My Wife and Mother-in-Law (Who Are Totally Non-Tech) Running a Full Multi-Stage Intrusion Simulation. Every Detection Fired. Spec sheets aren't capabilities: a Day-1 Gemma 4 eval on Telugu vision Design a Clean Form with Floating Labels in Bootstrap 5 Your MCP Server Is Probably Overprivileged - Here's a Scanner For It I built a free developer tools site that works entirely in your browser Maatru: An agentic Telugu literacy app for kids, built with Gemma 4
Why your AI agent loops forever (and how to break the cycle)
Alan West · 2026-05-23 · via DEV Community

The 3 AM tool-call loop from hell

Last month I deployed a ReAct-style agent to handle customer support triage. By 3 AM I had an alert: one user session had burned through 47,000 tokens in a single conversation. The agent had been calling the same search_knowledge_base tool 73 times in a row, with slightly different queries each time, never deciding to stop.

If you've built any kind of tool-using agent, you've probably seen this pattern. The agent gets stuck in a loop, either repeating the same action or oscillating between two actions. Tokens evaporate. Costs spike. Users wait forever for a response that never comes.

This isn't a model problem. It's an architectural problem. And once you understand what's actually happening inside the loop, the fix is straightforward.

What's actually happening inside the loop

A typical agent loop looks roughly like this:

def naive_agent_loop(user_query):
    messages = [{"role": "user", "content": user_query}]

    while True:
        response = llm.chat(messages, tools=AVAILABLE_TOOLS)

        # model decided to finalize
        if response.finish_reason == "stop":
            return response.content

        # otherwise, execute the tool call and feed the result back
        tool_call = response.tool_calls[0]
        result = execute_tool(tool_call.name, tool_call.args)

        messages.append(response.message)
        messages.append({"role": "tool", "content": str(result)})

Enter fullscreen mode Exit fullscreen mode

The model generates an action, you execute it, you append the result to the context, and you ask the model what to do next. Repeat until the model says "I'm done."

The failure mode lives in that "until done" condition. Three things commonly go wrong:

  • The model has no concept of "I've already tried this." Each iteration looks at the conversation history, but if the history shows ten failed searches, the model often interprets that as "I should search harder" rather than "this approach isn't working."
  • Tool errors are silent or ambiguous. When a search returns an empty list, is that "no results found" or "the tool is broken"? The model can't tell, so it tries again with a different phrasing.
  • The stop condition is implicit. Many implementations only stop when the model produces a final-answer message. There's nothing forcing the model to ever produce one.

The fix: explicit state, hard limits, structured feedback

Here's a stripped-down version of the pattern I use now. It runs in plain Python against any chat-completion API:

import hashlib
import json

MAX_STEPS = 10

def hash_action(name: str, args: dict) -> str:
    # canonical JSON so {a:1, b:2} and {b:2, a:1} hash the same
    payload = json.dumps({"name": name, "args": args}, sort_keys=True)
    return hashlib.sha256(payload.encode()).hexdigest()

def safer_agent_loop(user_query):
    messages = [{"role": "user", "content": user_query}]
    seen_actions = set()

    for step in range(MAX_STEPS):
        response = llm.chat(messages, tools=AVAILABLE_TOOLS)

        if response.finish_reason == "stop":
            return response.content

        tool_call = response.tool_calls[0]
        action_id = hash_action(tool_call.name, tool_call.args)

        if action_id in seen_actions:
            # tell the model it's repeating instead of running the tool again
            observation = {
                "status": "duplicate",
                "message": "You already called this tool with the same args. Try something different or finalize your answer."
            }
        else:
            seen_actions.add(action_id)
            observation = execute_tool(tool_call.name, tool_call.args)

        messages.append(response.message)
        messages.append({"role": "tool", "content": json.dumps(observation)})

    # hard stop: don't raise, return whatever partial answer we can
    return synthesize_partial_answer(messages)

Enter fullscreen mode Exit fullscreen mode

Three things changed:

  • Hard step limit. No matter what the model decides, the loop terminates after MAX_STEPS. Pick a number based on the actual task — for triage I use 8, for research workflows I sometimes go up to 20.
  • Action deduplication. Before executing a tool call, hash the (tool, args) pair and check whether we've already done it. If yes, return a synthetic observation telling the model so.
  • Structured error envelopes. Tools return a typed result, not raw strings. The model can see status: "no_results" vs status: "error" vs status: "ok" and make a better decision.

Detecting oscillation, not just repetition

Exact-duplicate detection catches the obvious case. But agents are clever enough to find creative ways to loop. The next pattern I had to handle: the agent calling search("authentication errors"), then search("auth errors"), then search("login failures") — semantically the same query, syntactically different.

A simple defense is to track the last N tool calls and check whether the agent is making progress:

from collections import deque

class ProgressTracker:
    def __init__(self, window: int = 4):
        self.window = window
        self.recent_tools = deque(maxlen=window)

    def record(self, tool_name: str) -> None:
        self.recent_tools.append(tool_name)

    def is_stuck(self) -> bool:
        # if the last N calls all hit the same tool, we're probably looping
        if len(self.recent_tools) < self.window:
            return False
        return len(set(self.recent_tools)) == 1

Enter fullscreen mode Exit fullscreen mode

This isn't perfect — semantic similarity via embeddings would be more robust — but it catches roughly 80% of the oscillation cases I've seen in production without the complexity of a separate similarity model.

Why frameworks don't solve this for you

I've worked with several popular agent frameworks. Most of them give you a max_iterations parameter and call it a day. That's the floor of what you need, not the ceiling.

If you're building anything beyond a demo, you need:

  • Per-tool quotas, not just global step limits
  • Logging that captures the full action/observation trail so you can debug after the fact
  • A mechanism to inject "you've already tried this" context back into the model
  • A graceful exit path when the limit hits — return a partial answer, not an exception

There's a community-maintained list of agent learning resources on GitHub called Agent-Learning-Hub that covers a lot of these patterns at a deeper level, including pointers to academic papers on planning and reflection that helped me understand why the naive ReAct loop has these failure modes in the first place.

Prevention tips that have actually saved me

A few habits I've adopted after enough 3 AM alerts:

  • Log every action and observation, with timestamps. When something goes wrong in production, you want the full trace, not just the final state.
  • Set token budgets per conversation, enforced server-side. Don't trust the agent to police itself.
  • Write tools that return semantically useful errors. "No results for query X. Try a more general term." beats [].
  • Test with adversarial prompts. Specifically try inputs designed to confuse the agent and verify it bails out cleanly.
  • Track tool-call entropy. If the variance in your tool-call distribution drops over the course of a conversation, that's a leading indicator of stuck behavior.

Wrapping up

Agent loops failing in production almost always come down to missing state, missing feedback, or missing limits. The model isn't broken — it's doing exactly what the prompt and the architecture told it to do. Fix the architecture and the loops go away.

The hardest part is accepting that "let the model decide when to stop" isn't a strategy. You're the one writing the loop. Own the termination logic.