惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

宝玉的分享
宝玉的分享
WordPress大学
WordPress大学
博客园 - 司徒正美
美团技术团队
酷 壳 – CoolShell
酷 壳 – CoolShell
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
小众软件
小众软件
量子位
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
有赞技术团队
有赞技术团队
博客园 - 【当耐特】
博客园 - Franky
Jina AI
Jina AI
人人都是产品经理
人人都是产品经理
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Threat Research - Cisco Blogs
D
Darknet – Hacking Tools, Hacker News & Cyber Security
F
Fox-IT International blog
T
ThreatConnect
A
Arctic Wolf
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
C
CERT Recently Published Vulnerability Notes
P
Palo Alto Networks Blog
李成银的技术随笔
Project Zero
Project Zero
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
F
Full Disclosure
H
Hacker News: Front Page
雷峰网
雷峰网
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
S
SegmentFault 最新的问题
S
Schneier on Security
T
Tor Project blog
博客园_首页
月光博客
月光博客
大猫的无限游戏
大猫的无限游戏
博客园 - 聂微东
S
Securelist
C
Comments on: Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Attack and Defense Labs
Attack and Defense Labs
IT之家
IT之家
博客园 - 叶小钗
J
Java Code Geeks
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events

DEV Community

I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible Why DMX Infrastructure is Still Stuck in the 90s Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks Pressure-testing Ota on Supabase: from setup prose to executable repo readiness VPC CNI en EKS: cómo dejar de pagar nodos que no usás The Future of Text Analysis: Introducing TechnoHelps Semantic Engine I built a Chrome Extension that saves product images + context directly to Google Drive & Sheets 95+ browser-based dev tools that never touch a server Running Qwen 2.5 Coder 14B Locally in Cursor with Ollama From a 10,000-line OpenSearch export script to a log analysis tool Ghost Bugs Cost $40K: A Neural Debugging Postmortem SECPAC: A Lightweight CLI Tool to Password-Protect Your Environment Variables 🚀 PasteCheck v1.7 + v1.8 — Hints that tell you what to fix, and a nudge panel that tells you where to start 8 Real Ways Developers Make Money in 2026 (Ranked by Effort) I built a free AI-powered Git CLI that writes your commit messages for you sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs OpenLiDARViewer: A Browser-Based LiDAR and Point-Cloud Viewer Local-First Browser Tools: What You Should Not Upload Online Why most freelancers undercharge (and the maths behind fixing it) We built a mahjong dangerous-tile predictor calibrated on 4.97M real hands Building a Chord Progression Generator in the Browser — Music Theory in JS, Sound via Web Audio API tutorial #10: 148 Opens, 0 Replies — How My Forge Cold Email v1 Completely Failed 9 in 10 Docker Compose files skip the basic security flags How to Forward Android SMS to Telegram Automatically I built the first security scanner for MCP servers — here's what I found Building an Interplanetary Quantum Logic Engine in Rust/Ovie From AI Code Generation to AI System Investigation I gave Gemini 3.5 Flash a CVE-fix PR to review. It found another bug in the same file. When I Realized We Were Throwing Away Half Our Engine's Potential TokenJuice and the 20-Minute Cron: Inside OpenHuman’s Aggressive Context-Harvesting Engine CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode Building a semantic search API in Go with Meilisearch April 2026 DigitalOcean Tutorials: Inference Optimization and AI Infrastructure Looking for DTMF transceiver module Moving Beyond "Tribal Software": Why the Singularity Demands the Interplanetary Hybrid Human Use SVGIcons as a Claude Custom Connector to Find Icons Faster DMARC Is Now a Proper Internet Standard: What Changed in RFC 9989/9990/9991 OpenTelemetry Is Now a CNCF Graduate — and It's Coming for Your AI Stack OpenHuman Follows OpenClaw’s Rise, But With an Obsidian Brain O erro mais caro em programas Solana: PDA sem bump check Build a Live Flight Radar in a Single HTML File DuckDB 1.5.3 Adds Quack Client-Server, SQLite Gets Cypher Graph Extension Custom Copilot Agents: Building Domain-Expert AI Teammates with Skills, MCP Tools, and Custom Knowledge RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains This week in Cursor + .NET — 3 rules + 4 essays (week ending May 22, 2026) RAG Architecture with n8n + PostgreSQL (pgvector) + Ollama Gemma4 on AWS EC2 Keep Your Taste I Built chanprobe Because My Go Queues Were Invisible Building a Live Solana TPS Meter with OrbitFlare's TypeScript SDK Using Gemma 4 to Analyze Bitcoin’s Next 5, 15, and 60 Minutes Security news weekly round-up - 22nd May 2026 When Stress Disguises Itself as Rational Planning (Bite-size Article) A Domain-Driven Notification Microservice — Patterns From Production I Built KubeCrash: Learn Kubernetes by Diagnosing Real Incidents The Real-World Test: How Gemini’s New Interface Won Over My Wife and Mother-in-Law (Who Are Totally Non-Tech) Running a Full Multi-Stage Intrusion Simulation. Every Detection Fired. Spec sheets aren't capabilities: a Day-1 Gemma 4 eval on Telugu vision Design a Clean Form with Floating Labels in Bootstrap 5 Your MCP Server Is Probably Overprivileged - Here's a Scanner For It I built a free developer tools site that works entirely in your browser Maatru: An agentic Telugu literacy app for kids, built with Gemma 4 GitHub confirms internal repository breach via poisoned VS Code extension Gemma 4 Is Not Just Another Open Model — It Changes What Developers Can Build Locally OpenVibe: An Open-Source AI Coding IDE That Works With Any Model I Inspected the System Program and It Looked Just Like My Wallet Hermes vs OpenClaw: The Two Most-Starred AI Agent Frameworks of 2026 Stop retraining YOLO: a developer’s guide to zero-shot object detection with generative VLMs AI, the New UI, Not the New API Sensors and Guides: Two Ways Your Harness Talks to Your Agent Fixing Google BigQuery Auth Proxying We didn't ship a feature, we shipped an agentic opt-in beta Wake-Up Call: Why AI Safety Guardrails Break Under Pressure 🧩 Handling 1,000+ Inputs with Angular Reactive Forms: An Enterprise Architecture Breakdown How to Collect Telegram Media Groups in Node.js I Ran Gemma 4 on an 8GB Laptop — Here’s What the Experience Was Actually Like Lean 4 101 for Python Programmers: A Gentle Introduction to Theorem Proving From Assistants to Agents: My Take on Google I/O 2026 Learning Progress Pt.16 From Unfinished Idea to Real Product: My BuildGenAI Comeback The Quiet Strategy I Revived a 9-Year-Old App with OpenAI Codex with a Product Engineer Mindset What Enterprise RAG Is Ready For Today and What Production Deployment Actually Requires Cursor AI Pricing 2026: Is It Worth $20/Month?
Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm
WonderLab · 2026-05-23 · via DEV Community

You Think Your Agent Is "Thinking." It's Actually Just Predicting Tokens.

Here's a scenario that happens more often than you'd think.

You ask an Agent to write a competitive analysis report. It confidently outputs three professional-looking pages — complete with data, conclusions, and strategic recommendations.

There's just one problem: every number comes from its training data, which may be a year old. It didn't search. It didn't verify. It just generated text that sounds authoritative.

That's not thinking. That's fluent hallucination.

Chain-of-Thought (CoT) has the same fundamental problem. CoT prompting tells the model to "reason step by step" before answering, and it genuinely does improve accuracy on many tasks. But the model is still reasoning entirely within language space. It can generate a very coherent chain of thought that leads to a completely wrong answer — because its only information source is training data.

ReAct was built to solve this.


ReAct: Reasoning + Acting, Interleaved

In 2022, researchers from Princeton and Google published ReAct: Synergizing Reasoning and Acting in Language Models.

The core idea is elegantly simple: let the model alternate between reasoning and acting, rather than reasoning first then acting, or acting without reasoning.

The concrete form is a three-part loop:

Thought  →  Action  →  Observation
   ↑                         │
   └─────────────────────────┘

Enter fullscreen mode Exit fullscreen mode

  • Thought: What the model is "thinking" — current analysis, what to do next, why
  • Action: The actual tool call and parameters
  • Observation: The real result returned by the tool

The critical mechanism: Observation is fed back into the model as new context, allowing it to reason based on actual results. This creates the "think → act → observe → think again" loop.

This one loop fixes CoT's core flaw: the model is no longer reasoning in isolation. It can interact with the real world and update its reasoning based on real feedback.


A Concrete Example: Watching an Agent "Think"

I built a complete ReAct Agent demo using LangGraph + GLM-4-Flash with two tools: calculator (safe math evaluator) and web_search (Bing search).

Code: agent-01-react-agent/react_agent.py

Here's an actual execution trace — Demo 3: search for the areas of Beijing and Shanghai, then calculate the difference.

════════════════════════════════════════════════════════════
  Demo 3 ▸ Multi-Round Search (Same Tool, Multiple Calls)
════════════════════════════════════════════════════════════

[User Question]
  First search for Beijing's area, then Shanghai's area,
  then calculate how much larger Beijing is in km².
────────────────────────────────────────────────────────────

[Step 1] THOUGHT → ACTION
  Action  : web_search(query='北京面积 平方公里')

  Observation : • Beijing area: Total area 16,410.54 km²...
────────────────────────────────────────────────────────────

[Step 2] THOUGHT → ACTION
  Action  : web_search(query='上海面积 平方公里')

  Observation : • Shanghai area: Land area approximately 6,340.5 km²...
────────────────────────────────────────────────────────────

[Step 3] THOUGHT → ACTION
  Action  : calculator(expression='16410.54 - 6340.5')

  Observation : 10070.04
────────────────────────────────────────────────────────────

[Final Answer]
  Beijing's area is approximately 16,410.54 km², Shanghai's is
  approximately 6,340.5 km². Beijing is about 10,070.04 km² larger.
════════════════════════════════════════════════════════════

Enter fullscreen mode Exit fullscreen mode

Notice what happened here:

  1. The Agent decided on its own to search Beijing first, then Shanghai, then calculate — no hardcoded execution order
  2. Each search result (Observation) was read by the model and used to determine the next step
  3. The final calculation used real numbers extracted from real searches

This is ReAct's value: the execution path is planned dynamically at runtime, not hardcoded by the developer in advance.


ReAct vs. Chain-of-Thought: A Direct Comparison

Aspect Chain-of-Thought ReAct
Information source Training data only Training data + tool results
Execution path Reasoning in language space Think → real action → observe results
Can access real-time data ✓ (via tools)
Can execute computation/code ✓ (via tools)
Reasoning verifiable Hard to verify Each Observation is a real result
Risk of side effects Low (no actions) High (requires safety boundaries)

One sentence summary: CoT makes the model think clearly. ReAct makes it think while doing.


Building a ReAct Agent with LangGraph

Here's the core implementation. The code uses LangGraph's create_react_agent — one of the cleanest ReAct implementations available.

1. Safe Calculator Tool

import ast
import operator
from typing import Any
from langchain_core.tools import tool

_SAFE_OPS: dict[type, Any] = {
    ast.Add:  operator.add,
    ast.Sub:  operator.sub,
    ast.Mult: operator.mul,
    ast.Div:  operator.truediv,
    ast.Pow:  operator.pow,
    ast.Mod:  operator.mod,
    ast.USub: operator.neg,
}

def _eval_ast(node: ast.AST) -> float:
    if isinstance(node, ast.Constant):
        return float(node.value)
    if isinstance(node, ast.BinOp):
        op_fn = _SAFE_OPS.get(type(node.op))
        if op_fn is None:
            raise ValueError(f"Unsupported operator: {type(node.op).__name__}")
        return op_fn(_eval_ast(node.left), _eval_ast(node.right))
    if isinstance(node, ast.UnaryOp):
        op_fn = _SAFE_OPS.get(type(node.op))
        return op_fn(_eval_ast(node.operand))
    raise ValueError(f"Unsupported AST node: {type(node).__name__}")

@tool
def calculator(expression: str) -> str:
    """Evaluate a math expression. Supports + - * / ** % and parentheses."""
    try:
        tree = ast.parse(expression.strip(), mode="eval")
        result = _eval_ast(tree.body)
        if result == int(result):
            return str(int(result))
        return f"{result:.6g}"
    except (ValueError, SyntaxError, ZeroDivisionError) as e:
        return f"Calculation error: {e}"

Enter fullscreen mode Exit fullscreen mode


Why not just use eval()?

eval("__import__('os').system('rm -rf /')") — that line will execute a deletion on your machine. Tools are the Agent's "hands." Once an attacker manipulates the LLM through prompt injection, eval() becomes a direct path to your system.

AST parsing only allows math operation nodes — everything else is rejected. This is the foundational principle of safe tool design.

2. Web Search Tool

import requests
from bs4 import BeautifulSoup
from urllib.parse import quote

_BING_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
        "(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}

@tool
def web_search(query: str) -> str:
    """Search the web and return the 3 most relevant snippets."""
    try:
        url = f"https://www.bing.com/search?q={quote(query)}&setlang=zh-CN"
        resp = requests.get(url, headers=_BING_HEADERS, timeout=10)
        resp.raise_for_status()

        soup = BeautifulSoup(resp.text, "html.parser")
        snippets = []
        for li in soup.find_all("li", class_="b_algo")[:4]:
            h2 = li.find("h2")
            title = h2.get_text(strip=True) if h2 else ""
            p = li.find("p")
            body = p.get_text(strip=True) if p else ""
            if title or body:
                snippets.append(f"{title}: {body}"[:200])

        return "\n".join(snippets[:3]) if snippets else "No results found."
    except requests.RequestException as e:
        return f"Search failed: {e}"

Enter fullscreen mode Exit fullscreen mode

3. Building the Agent

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
# LangGraph V1.0 moved create_react_agent to chat_agent_executor submodule
from langgraph.prebuilt.chat_agent_executor import create_react_agent

load_dotenv()

llm = ChatOpenAI(
    base_url="https://open.bigmodel.cn/api/paas/v4",
    api_key=os.getenv("LLM_API_KEY"),
    model="glm-4-flash",
    temperature=0,
)

agent = create_react_agent(
    model=llm,
    tools=[calculator, web_search],
)

result = agent.invoke(
    {"messages": [("user", "How much larger is Beijing than Shanghai in km²? Search and calculate.")]},
    config={"recursion_limit": 20},
)
print(result["messages"][-1].content)

Enter fullscreen mode Exit fullscreen mode

Three core lines: define tools → bind LLM → run. LangGraph handles all the message routing, tool call dispatch, result injection, and loop control under the hood.


The correct import path for create_react_agent

LangGraph V1.0 moved this function to langgraph.prebuilt.chat_agent_executor. Importing from langgraph.prebuilt triggers a LangGraphDeprecatedSinceV10 warning. Use the new path:

# ✅ Recommended
from langgraph.prebuilt.chat_agent_executor import create_react_agent

# ⚠️ Triggers deprecation warning
from langgraph.prebuilt import create_react_agent

Enter fullscreen mode Exit fullscreen mode


How the Message Flow Actually Works

To truly understand ReAct, you need to see the underlying message sequence. Here's what the LLM receives at the start of each cycle:

Context passed to LLM at round N:
┌─────────────────────────────────────────────────────┐
│ [System]  You are an assistant with these tools:    │
│           calculator, web_search                    │
│                                                     │
│ [Human]   Question: How much larger is Beijing?     │
│                                                     │
│ [AI]      (tool call) web_search("Beijing area")   │  ← Round 1 Action
│ [Tool]    Beijing area: 16,410 km²                 │  ← Round 1 Observation
│                                                     │
│ [AI]      (tool call) web_search("Shanghai area")  │  ← Round 2 Action
│ [Tool]    Shanghai area: 6,340 km²                 │  ← Round 2 Observation
│                                                     │
│ ← LLM decides what to do next here →               │
└─────────────────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode

Each cycle, the entire history is passed to the LLM. The model "sees" all previous thoughts and observations, then decides:

  • Continue calling tools (more information needed)
  • Stop and deliver a final answer (enough information gathered)

This is why it's called a loop — the model itself is the loop's termination condition. It decides when to stop.


When Things Go Wrong: Failure Modes and Guards

The same "decide when to stop" design that makes ReAct powerful also introduces a risk: if the model misjudges, the loop never terminates.

Common runaway scenarios:

Scenario 1: Tool keeps failing, model keeps retrying

Action: web_search("vague ambiguous query")
Observation: No results found
Thought: Let me try different keywords
Action: web_search("different keywords")
Observation: No results found
Thought: Maybe one more variation...
(infinite loop)

Enter fullscreen mode Exit fullscreen mode

Scenario 2: Model misunderstands the task and pursues the wrong direction

Thought: I need the exact value of X
Action: calculator("...")
Observation: Approximate result
Thought: Not precise enough, I need more decimal places
Action: calculator("...")
(infinite pursuit of "precision")

Enter fullscreen mode Exit fullscreen mode

Scenario 3: Tools form a circular dependency

Thought: I need to know A before I can look up B
Action: search(A)
Observation: Requires knowing B first
Thought: I need to know B before I can look up A
(circular dependency)

Enter fullscreen mode Exit fullscreen mode

LangGraph's recursion_limit parameter is the hard safety net:

result = agent.invoke(
    {"messages": [("user", question)]},
    config={"recursion_limit": 5},  # Force-stop after 5 steps
)

Enter fullscreen mode Exit fullscreen mode

When the step count exceeds the limit, LangGraph raises GraphRecursionError:

[recursion_limit triggered]
  Exception type: GraphRecursionError
  Message: Recursion limit of 5 reached without hitting a stop condition...

→ Conclusion: Always set a reasonable recursion_limit in production (15~25 recommended)
→ Too low: legitimate tasks get cut off; Too high: runaway Agent burns massive tokens

Enter fullscreen mode Exit fullscreen mode


How to set recursion_limit

  • Simple tasks (single tool call): 5–8 steps is enough
  • Medium tasks (multi-tool, multi-step): 10–15 steps
  • Complex research tasks: 20–25 steps
  • Tasks requiring 30+ steps should reconsider architecture — you may need multi-Agent collaboration (covered in a later article)

The rule of thumb: set it to roughly 2× the number of steps a successful execution needs. Room to breathe, but a real ceiling.


Five Demo Scenarios: From Simple to Complex

The complete code includes 5 progressive demos covering the main ReAct usage patterns:

Demo 1: Pure Calculation (single tool, single step)

Question: Calculate (1024 * 768) + (1920 * 1080)
Steps: calculator('(1024 * 768) + (1920 * 1080)') → 2860032

Enter fullscreen mode Exit fullscreen mode

Validates the basic tool-calling pipeline.

Demo 2: Search + Calculate (multi-tool, multi-step)

Question: What year were Python and JavaScript first released? Calculate the difference.
Steps: web_search("Python release year") → web_search("JavaScript release year") → calculator

Enter fullscreen mode Exit fullscreen mode

Shows the Agent autonomously orchestrating different tools in the right order.

Demo 3: Multi-round Search (same tool, multiple calls)

Question: How much larger is Beijing than Shanghai in km²?
Steps: web_search("Beijing area") → web_search("Shanghai area") → calculator → 10070.04

Enter fullscreen mode Exit fullscreen mode

Shows the Agent deciding what to search second based on what it found first.

Demo 4: No Tools Needed (direct answer)

Question: Explain the ReAct paradigm in one sentence.
Steps: No tool calls — direct answer

Enter fullscreen mode Exit fullscreen mode

Shows the Agent knowing when not to call tools. This matters as much as knowing when to call them.

Demo 5: Trigger recursion_limit (safety net demo)

Question: Search Python/Java/C release years, calculate the sum (~10 steps needed)
Limit: recursion_limit=5
Result: GraphRecursionError (correctly triggered)

Enter fullscreen mode Exit fullscreen mode

Production safety mechanism verification.


An Interesting Observation: Agents Can "Luck Into" Correct Answers

Demo 2 produced a result worth documenting carefully.

The Agent searched for JavaScript's release year. The Bing snippet it received came from an article published in 2023 that mentioned Python's 1991 origin. The model appears to have confused "2023" (article publication date) with JavaScript's release year. The calculation step ran 2023 - 1991 = 32, returning 32.

But the final answer was correct: "Python was released in 1991, JavaScript in 1995 — a 4-year difference."

The model overrode its (incorrect) calculation result with its internal training knowledge and delivered the right answer.

This reveals a subtle property of ReAct: an Agent's reasoning chain and its final answer can be decoupled. The model may make errors during tool calls, then "self-correct" in the final answer generation using built-in knowledge.

As an outcome, this is fine — you got the right answer. From an engineering perspective, it's a problem. If you need traceable, verifiable conclusions, "it happened to be correct" isn't sufficient. This is one of the challenges that Harness Engineering addresses (covered in a later article in this series).


Trace Visualization: Making Agent Reasoning Observable

A common production pain point: when something goes wrong, you don't know which step failed, because only the final answer is visible by default.

Good practice: print the full Thought/Action/Observation sequence as a readable Trace:

from langchain_core.messages import AIMessage, HumanMessage, ToolMessage

def print_trace(result: dict) -> None:
    for msg in result["messages"]:
        if isinstance(msg, HumanMessage):
            print(f"[USER] {msg.content}")

        elif isinstance(msg, AIMessage):
            content = msg.content if isinstance(msg.content, str) else ""
            if msg.tool_calls:
                for tc in msg.tool_calls:
                    args = ", ".join(f"{k}={repr(v)}" for k, v in tc["args"].items())
                    print(f"[ACTION] {tc['name']}({args})")
            else:
                print(f"[FINAL ANSWER] {content.strip()}")

        elif isinstance(msg, ToolMessage):
            obs = msg.content if isinstance(msg.content, str) else str(msg.content)
            print(f"[OBSERVATION] {obs.strip()[:300]}")

Enter fullscreen mode Exit fullscreen mode


GLM-4-Flash content field pollution

When using GLM-4-Flash, you may occasionally see raw JSON in AIMessage.content — something like {"index": 0, "delta": ...}. This is the model leaking internal streaming delta data into the content field.

Fix: detect when content starts with { or [ and can be parsed by json.loads(), then discard it.

def _clean_thought(text: str) -> str:
    stripped = text.strip()
    if stripped and stripped[0] in ("{", "["):
        try:
            json.loads(stripped)
            return ""  # leaked JSON, discard
        except json.JSONDecodeError:
            pass
    return text

Enter fullscreen mode Exit fullscreen mode

The complete demo code already includes this handling.


The Limitations of ReAct

ReAct is powerful, but it's not a silver bullet. Knowing its limits helps you use it correctly.

1. Context window fills up fast

Each cycle packs the entire history into context. Step count grows, token consumption spikes. Complex tasks (20+ steps) may fail on models with limited context windows.

2. Tool descriptions drive everything — write them well

ReAct relies entirely on the LLM understanding tool documentation to decide which tool to call and with what parameters. Vague docstrings lead to wrong tool selection. Tool descriptions are the invisible API of a ReAct system — treat them like API documentation.

3. No global planning capability

Standard ReAct is greedy: each step only looks at the current state to decide the next move, with no "plan the whole thing first, then execute" capability. For tasks requiring long-horizon planning (like writing an entire codebase), this can get stuck in local optima. This is what the Plan-and-Solve paradigm addresses (Article 3 in this series).

4. Poor fault tolerance for tool failures

When a tool returns an error, the model has to infer the next step from the error message alone. There's no predefined retry strategy or fallback logic. This needs to be handled at the tool design level and the Harness layer.


Interview Prep: Articulate How Your Agent "Thinks"

Common question: How does your Agent decide its next action?

Many candidates answer "it calls tools." But what the interviewer actually wants to hear is: who decides which tool to call, and when does it stop?

A clear answer framework:

"We use the ReAct paradigm. The core is a Thought → Action → Observation loop. At each step, the LLM looks at the full context — user question plus all previous Observations — and decides the next Action. The tool runs, its result is injected as a ToolMessage, and the model reasons again.

The loop terminates when the LLM judges it has enough information and stops calling tools, generating the final answer directly.

To prevent runaway loops, we set recursion_limit (typically 15–25). When it's exceeded, we catch the exception and fall back to a degraded response. We also log the full Trace — every Action and Observation — so we can replay the entire reasoning chain when debugging."

Key differentiators: mentioning Trace observability and recursion_limit shows you've thought beyond demos and considered production stability.


Summary

Three things from this article:

  1. ReAct = Reasoning + Acting, interleaved: The Thought → Action → Observation loop lets Agents update their reasoning based on real-world feedback. The fundamental difference from CoT: actions produce real results that feed back into the reasoning process.

  2. Tool design is ReAct's invisible interface: Docstring quality directly determines how accurately the LLM selects tools. Safe implementation (AST instead of eval) determines whether the system boundary holds.

  3. recursion_limit is a required production setting: The model decides when to stop — that's inherently risky. recursion_limit is the last line of defense. Recommended value: roughly 2× the steps needed for successful completion.


Next up: Agent Series Article 3 — Plan-and-Solve: When ReAct Isn't Enough, How Agents Plan Before Acting. We'll see where ReAct's greedy strategy hits its ceiling on complex tasks, and how introducing an explicit planning layer breaks through it.


References


Welcome to visit my personal homepage for more useful knowledge and interesting products