You Think Your Agent Is "Thinking." It's Actually Just Predicting Tokens.
Here's a scenario that happens more often than you'd think.
You ask an Agent to write a competitive analysis report. It confidently outputs three professional-looking pages — complete with data, conclusions, and strategic recommendations.
There's just one problem: every number comes from its training data, which may be a year old. It didn't search. It didn't verify. It just generated text that sounds authoritative.
That's not thinking. That's fluent hallucination.
Chain-of-Thought (CoT) has the same fundamental problem. CoT prompting tells the model to "reason step by step" before answering, and it genuinely does improve accuracy on many tasks. But the model is still reasoning entirely within language space. It can generate a very coherent chain of thought that leads to a completely wrong answer — because its only information source is training data.
ReAct was built to solve this.
ReAct: Reasoning + Acting, Interleaved
In 2022, researchers from Princeton and Google published ReAct: Synergizing Reasoning and Acting in Language Models.
The core idea is elegantly simple: let the model alternate between reasoning and acting, rather than reasoning first then acting, or acting without reasoning.
The concrete form is a three-part loop:
Thought → Action → Observation
↑ │
└─────────────────────────┘
- Thought: What the model is "thinking" — current analysis, what to do next, why
- Action: The actual tool call and parameters
- Observation: The real result returned by the tool
The critical mechanism: Observation is fed back into the model as new context, allowing it to reason based on actual results. This creates the "think → act → observe → think again" loop.
This one loop fixes CoT's core flaw: the model is no longer reasoning in isolation. It can interact with the real world and update its reasoning based on real feedback.
A Concrete Example: Watching an Agent "Think"
I built a complete ReAct Agent demo using LangGraph + GLM-4-Flash with two tools: calculator (safe math evaluator) and web_search (Bing search).
Code: agent-01-react-agent/react_agent.py
Here's an actual execution trace — Demo 3: search for the areas of Beijing and Shanghai, then calculate the difference.
════════════════════════════════════════════════════════════
Demo 3 ▸ Multi-Round Search (Same Tool, Multiple Calls)
════════════════════════════════════════════════════════════
[User Question]
First search for Beijing's area, then Shanghai's area,
then calculate how much larger Beijing is in km².
────────────────────────────────────────────────────────────
[Step 1] THOUGHT → ACTION
Action : web_search(query='北京面积 平方公里')
Observation : • Beijing area: Total area 16,410.54 km²...
────────────────────────────────────────────────────────────
[Step 2] THOUGHT → ACTION
Action : web_search(query='上海面积 平方公里')
Observation : • Shanghai area: Land area approximately 6,340.5 km²...
────────────────────────────────────────────────────────────
[Step 3] THOUGHT → ACTION
Action : calculator(expression='16410.54 - 6340.5')
Observation : 10070.04
────────────────────────────────────────────────────────────
[Final Answer]
Beijing's area is approximately 16,410.54 km², Shanghai's is
approximately 6,340.5 km². Beijing is about 10,070.04 km² larger.
════════════════════════════════════════════════════════════
Notice what happened here:
- The Agent decided on its own to search Beijing first, then Shanghai, then calculate — no hardcoded execution order
- Each search result (Observation) was read by the model and used to determine the next step
- The final calculation used real numbers extracted from real searches
This is ReAct's value: the execution path is planned dynamically at runtime, not hardcoded by the developer in advance.
ReAct vs. Chain-of-Thought: A Direct Comparison
| Aspect | Chain-of-Thought | ReAct |
|---|---|---|
| Information source | Training data only | Training data + tool results |
| Execution path | Reasoning in language space | Think → real action → observe results |
| Can access real-time data | ✗ | ✓ (via tools) |
| Can execute computation/code | ✗ | ✓ (via tools) |
| Reasoning verifiable | Hard to verify | Each Observation is a real result |
| Risk of side effects | Low (no actions) | High (requires safety boundaries) |
One sentence summary: CoT makes the model think clearly. ReAct makes it think while doing.
Building a ReAct Agent with LangGraph
Here's the core implementation. The code uses LangGraph's create_react_agent — one of the cleanest ReAct implementations available.
1. Safe Calculator Tool
import ast
import operator
from typing import Any
from langchain_core.tools import tool
_SAFE_OPS: dict[type, Any] = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Pow: operator.pow,
ast.Mod: operator.mod,
ast.USub: operator.neg,
}
def _eval_ast(node: ast.AST) -> float:
if isinstance(node, ast.Constant):
return float(node.value)
if isinstance(node, ast.BinOp):
op_fn = _SAFE_OPS.get(type(node.op))
if op_fn is None:
raise ValueError(f"Unsupported operator: {type(node.op).__name__}")
return op_fn(_eval_ast(node.left), _eval_ast(node.right))
if isinstance(node, ast.UnaryOp):
op_fn = _SAFE_OPS.get(type(node.op))
return op_fn(_eval_ast(node.operand))
raise ValueError(f"Unsupported AST node: {type(node).__name__}")
@tool
def calculator(expression: str) -> str:
"""Evaluate a math expression. Supports + - * / ** % and parentheses."""
try:
tree = ast.parse(expression.strip(), mode="eval")
result = _eval_ast(tree.body)
if result == int(result):
return str(int(result))
return f"{result:.6g}"
except (ValueError, SyntaxError, ZeroDivisionError) as e:
return f"Calculation error: {e}"
Why not just use eval()?
eval("__import__('os').system('rm -rf /')") — that line will execute a deletion on your machine. Tools are the Agent's "hands." Once an attacker manipulates the LLM through prompt injection, eval() becomes a direct path to your system.
AST parsing only allows math operation nodes — everything else is rejected. This is the foundational principle of safe tool design.
2. Web Search Tool
import requests
from bs4 import BeautifulSoup
from urllib.parse import quote
_BING_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
}
@tool
def web_search(query: str) -> str:
"""Search the web and return the 3 most relevant snippets."""
try:
url = f"https://www.bing.com/search?q={quote(query)}&setlang=zh-CN"
resp = requests.get(url, headers=_BING_HEADERS, timeout=10)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
snippets = []
for li in soup.find_all("li", class_="b_algo")[:4]:
h2 = li.find("h2")
title = h2.get_text(strip=True) if h2 else ""
p = li.find("p")
body = p.get_text(strip=True) if p else ""
if title or body:
snippets.append(f"• {title}: {body}"[:200])
return "\n".join(snippets[:3]) if snippets else "No results found."
except requests.RequestException as e:
return f"Search failed: {e}"
3. Building the Agent
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
# LangGraph V1.0 moved create_react_agent to chat_agent_executor submodule
from langgraph.prebuilt.chat_agent_executor import create_react_agent
load_dotenv()
llm = ChatOpenAI(
base_url="https://open.bigmodel.cn/api/paas/v4",
api_key=os.getenv("LLM_API_KEY"),
model="glm-4-flash",
temperature=0,
)
agent = create_react_agent(
model=llm,
tools=[calculator, web_search],
)
result = agent.invoke(
{"messages": [("user", "How much larger is Beijing than Shanghai in km²? Search and calculate.")]},
config={"recursion_limit": 20},
)
print(result["messages"][-1].content)
Three core lines: define tools → bind LLM → run. LangGraph handles all the message routing, tool call dispatch, result injection, and loop control under the hood.
The correct import path for create_react_agent
LangGraph V1.0 moved this function to langgraph.prebuilt.chat_agent_executor. Importing from langgraph.prebuilt triggers a LangGraphDeprecatedSinceV10 warning. Use the new path:
# ✅ Recommended
from langgraph.prebuilt.chat_agent_executor import create_react_agent
# ⚠️ Triggers deprecation warning
from langgraph.prebuilt import create_react_agent
How the Message Flow Actually Works
To truly understand ReAct, you need to see the underlying message sequence. Here's what the LLM receives at the start of each cycle:
Context passed to LLM at round N:
┌─────────────────────────────────────────────────────┐
│ [System] You are an assistant with these tools: │
│ calculator, web_search │
│ │
│ [Human] Question: How much larger is Beijing? │
│ │
│ [AI] (tool call) web_search("Beijing area") │ ← Round 1 Action
│ [Tool] Beijing area: 16,410 km² │ ← Round 1 Observation
│ │
│ [AI] (tool call) web_search("Shanghai area") │ ← Round 2 Action
│ [Tool] Shanghai area: 6,340 km² │ ← Round 2 Observation
│ │
│ ← LLM decides what to do next here → │
└─────────────────────────────────────────────────────┘
Each cycle, the entire history is passed to the LLM. The model "sees" all previous thoughts and observations, then decides:
- Continue calling tools (more information needed)
- Stop and deliver a final answer (enough information gathered)
This is why it's called a loop — the model itself is the loop's termination condition. It decides when to stop.
When Things Go Wrong: Failure Modes and Guards
The same "decide when to stop" design that makes ReAct powerful also introduces a risk: if the model misjudges, the loop never terminates.
Common runaway scenarios:
Scenario 1: Tool keeps failing, model keeps retrying
Action: web_search("vague ambiguous query")
Observation: No results found
Thought: Let me try different keywords
Action: web_search("different keywords")
Observation: No results found
Thought: Maybe one more variation...
(infinite loop)
Scenario 2: Model misunderstands the task and pursues the wrong direction
Thought: I need the exact value of X
Action: calculator("...")
Observation: Approximate result
Thought: Not precise enough, I need more decimal places
Action: calculator("...")
(infinite pursuit of "precision")
Scenario 3: Tools form a circular dependency
Thought: I need to know A before I can look up B
Action: search(A)
Observation: Requires knowing B first
Thought: I need to know B before I can look up A
(circular dependency)
LangGraph's recursion_limit parameter is the hard safety net:
result = agent.invoke(
{"messages": [("user", question)]},
config={"recursion_limit": 5}, # Force-stop after 5 steps
)
When the step count exceeds the limit, LangGraph raises GraphRecursionError:
[recursion_limit triggered]
Exception type: GraphRecursionError
Message: Recursion limit of 5 reached without hitting a stop condition...
→ Conclusion: Always set a reasonable recursion_limit in production (15~25 recommended)
→ Too low: legitimate tasks get cut off; Too high: runaway Agent burns massive tokens
How to set recursion_limit
- Simple tasks (single tool call): 5–8 steps is enough
- Medium tasks (multi-tool, multi-step): 10–15 steps
- Complex research tasks: 20–25 steps
- Tasks requiring 30+ steps should reconsider architecture — you may need multi-Agent collaboration (covered in a later article)
The rule of thumb: set it to roughly 2× the number of steps a successful execution needs. Room to breathe, but a real ceiling.
Five Demo Scenarios: From Simple to Complex
The complete code includes 5 progressive demos covering the main ReAct usage patterns:
Demo 1: Pure Calculation (single tool, single step)
Question: Calculate (1024 * 768) + (1920 * 1080)
Steps: calculator('(1024 * 768) + (1920 * 1080)') → 2860032
Validates the basic tool-calling pipeline.
Demo 2: Search + Calculate (multi-tool, multi-step)
Question: What year were Python and JavaScript first released? Calculate the difference.
Steps: web_search("Python release year") → web_search("JavaScript release year") → calculator
Shows the Agent autonomously orchestrating different tools in the right order.
Demo 3: Multi-round Search (same tool, multiple calls)
Question: How much larger is Beijing than Shanghai in km²?
Steps: web_search("Beijing area") → web_search("Shanghai area") → calculator → 10070.04
Shows the Agent deciding what to search second based on what it found first.
Demo 4: No Tools Needed (direct answer)
Question: Explain the ReAct paradigm in one sentence.
Steps: No tool calls — direct answer
Shows the Agent knowing when not to call tools. This matters as much as knowing when to call them.
Demo 5: Trigger recursion_limit (safety net demo)
Question: Search Python/Java/C release years, calculate the sum (~10 steps needed)
Limit: recursion_limit=5
Result: GraphRecursionError (correctly triggered)
Production safety mechanism verification.
An Interesting Observation: Agents Can "Luck Into" Correct Answers
Demo 2 produced a result worth documenting carefully.
The Agent searched for JavaScript's release year. The Bing snippet it received came from an article published in 2023 that mentioned Python's 1991 origin. The model appears to have confused "2023" (article publication date) with JavaScript's release year. The calculation step ran 2023 - 1991 = 32, returning 32.
But the final answer was correct: "Python was released in 1991, JavaScript in 1995 — a 4-year difference."
The model overrode its (incorrect) calculation result with its internal training knowledge and delivered the right answer.
This reveals a subtle property of ReAct: an Agent's reasoning chain and its final answer can be decoupled. The model may make errors during tool calls, then "self-correct" in the final answer generation using built-in knowledge.
As an outcome, this is fine — you got the right answer. From an engineering perspective, it's a problem. If you need traceable, verifiable conclusions, "it happened to be correct" isn't sufficient. This is one of the challenges that Harness Engineering addresses (covered in a later article in this series).
Trace Visualization: Making Agent Reasoning Observable
A common production pain point: when something goes wrong, you don't know which step failed, because only the final answer is visible by default.
Good practice: print the full Thought/Action/Observation sequence as a readable Trace:
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
def print_trace(result: dict) -> None:
for msg in result["messages"]:
if isinstance(msg, HumanMessage):
print(f"[USER] {msg.content}")
elif isinstance(msg, AIMessage):
content = msg.content if isinstance(msg.content, str) else ""
if msg.tool_calls:
for tc in msg.tool_calls:
args = ", ".join(f"{k}={repr(v)}" for k, v in tc["args"].items())
print(f"[ACTION] {tc['name']}({args})")
else:
print(f"[FINAL ANSWER] {content.strip()}")
elif isinstance(msg, ToolMessage):
obs = msg.content if isinstance(msg.content, str) else str(msg.content)
print(f"[OBSERVATION] {obs.strip()[:300]}")
GLM-4-Flash content field pollution
When using GLM-4-Flash, you may occasionally see raw JSON in AIMessage.content — something like {"index": 0, "delta": ...}. This is the model leaking internal streaming delta data into the content field.
Fix: detect when content starts with { or [ and can be parsed by json.loads(), then discard it.
def _clean_thought(text: str) -> str:
stripped = text.strip()
if stripped and stripped[0] in ("{", "["):
try:
json.loads(stripped)
return "" # leaked JSON, discard
except json.JSONDecodeError:
pass
return text
The complete demo code already includes this handling.
The Limitations of ReAct
ReAct is powerful, but it's not a silver bullet. Knowing its limits helps you use it correctly.
1. Context window fills up fast
Each cycle packs the entire history into context. Step count grows, token consumption spikes. Complex tasks (20+ steps) may fail on models with limited context windows.
2. Tool descriptions drive everything — write them well
ReAct relies entirely on the LLM understanding tool documentation to decide which tool to call and with what parameters. Vague docstrings lead to wrong tool selection. Tool descriptions are the invisible API of a ReAct system — treat them like API documentation.
3. No global planning capability
Standard ReAct is greedy: each step only looks at the current state to decide the next move, with no "plan the whole thing first, then execute" capability. For tasks requiring long-horizon planning (like writing an entire codebase), this can get stuck in local optima. This is what the Plan-and-Solve paradigm addresses (Article 3 in this series).
4. Poor fault tolerance for tool failures
When a tool returns an error, the model has to infer the next step from the error message alone. There's no predefined retry strategy or fallback logic. This needs to be handled at the tool design level and the Harness layer.
Interview Prep: Articulate How Your Agent "Thinks"
Common question: How does your Agent decide its next action?
Many candidates answer "it calls tools." But what the interviewer actually wants to hear is: who decides which tool to call, and when does it stop?
A clear answer framework:
"We use the ReAct paradigm. The core is a Thought → Action → Observation loop. At each step, the LLM looks at the full context — user question plus all previous Observations — and decides the next Action. The tool runs, its result is injected as a ToolMessage, and the model reasons again.
The loop terminates when the LLM judges it has enough information and stops calling tools, generating the final answer directly.
To prevent runaway loops, we set
recursion_limit(typically 15–25). When it's exceeded, we catch the exception and fall back to a degraded response. We also log the full Trace — every Action and Observation — so we can replay the entire reasoning chain when debugging."
Key differentiators: mentioning Trace observability and recursion_limit shows you've thought beyond demos and considered production stability.
Summary
Three things from this article:
ReAct = Reasoning + Acting, interleaved: The Thought → Action → Observation loop lets Agents update their reasoning based on real-world feedback. The fundamental difference from CoT: actions produce real results that feed back into the reasoning process.
Tool design is ReAct's invisible interface: Docstring quality directly determines how accurately the LLM selects tools. Safe implementation (AST instead of eval) determines whether the system boundary holds.
recursion_limitis a required production setting: The model decides when to stop — that's inherently risky.recursion_limitis the last line of defense. Recommended value: roughly 2× the steps needed for successful completion.
Next up: Agent Series Article 3 — Plan-and-Solve: When ReAct Isn't Enough, How Agents Plan Before Acting. We'll see where ReAct's greedy strategy hits its ceiling on complex tasks, and how introducing an explicit planning layer breaks through it.
References
- Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023
- LangGraph Documentation
- hello-agents Open Tutorial (Chapter 4)
- Demo code for this article: agent-01-react-agent
Welcome to visit my personal homepage for more useful knowledge and interesting products
























