慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
《代理系列(三)计划与解决——先思后行》
WonderLab · 2026-05-24 · via DEV Community

何时而ReAct之策遇阻?

前文已述ReAct之策,其贪欲而仅观当下之境,遂决次行。此策虽多效,然有一类事,则其困矣.

试想,若使一使,为之事若此:

索 Python、Java、Go 之發表年,以時序為序。計 Python 與 Go 相隔幾載。

一典型之 ReAct 執行,或若此:

Action: web_search("Python release year")
Action: web_search("Java release year")
Action: web_search("Go release year")
Action: calculator("...")
(occasionally repeats a search or takes extra steps)

入全屏模式 離全屏模式

此非大弊——然有隱患:ReAct 行動之前,無全體之策。 不知其事需几步,不知其步相倚之序,亦不知其于全事之位。每步皆局宜,非全宜。

夫多步之务,其倚明若,此犹无图而导——终至,然必迁曲。

计解之答先以大语言模型成完整行动之策,次第而施之.


两阶段架构

此范式出2023年论文计划-求解提示。其旨二阶:

阶段一 — 计划:请令大语言模型鸟瞰全局,剖析全务,列次第之序。此阶段不调用工具,唯以思辨为要。

:次第二——解:依所列次第,逐一施行。每步可调用工具。前步之果,注入后步之境。

既添生产要之容错机制,全然架构若此:

Task
 │
 ▼
[Plan Node]     ← LLM generates 3-7 step plan (no execution, just planning)
 │
 ▼
[Execute Node]  ← Execute current step (embedded ReAct, can call tools)
 │
 ├─ Step failed? ─→ [Replan Node] ← Re-plan remaining steps based on progress so far
 │                      │
 │                      └──────────────┐
 │                                     ▼
 ├─ More steps? ─→ back to Execute    Execute (continue)
 │
 └─ All done? ─→ [Finalize Node] ← Output final answer
                       │
                       ▼
                      END

入全景模式 出全景模式

与ReAct之别:ReAct乃无端之环;Plan-and-Solve则为有终之序。


LangGraph之实:态+图

LangGraph者,此架构之至器也——其以状态机(StateGraph)摹拟之,使状态流于节点之间.

状态之设

from typing import TypedDict

class PlanSolveState(TypedDict):
    task: str                    # original user task
    plan: list[str]              # current plan (list of steps)
    completed_steps: list[str]   # completed steps with result summaries
    current_step_index: int      # which step we're on (0-based)
    step_result: str             # result of the current step
    replan_count: int            # how many times we've replanned
    final_answer: str            # the final answer

入全景模式 出全景模式

状态者,全图之"血脉"也——诸节点皆读之而书之。善设状态,则胜半战矣。

计划节点

def plan_node(state: PlanSolveState) -> dict:
    messages = [
        SystemMessage(content=PLANNER_SYSTEM),  # planner expert prompt
        HumanMessage(content=f"Task: {state['task']}"),
    ]
    response = llm.invoke(messages)
    plan = parse_plan(response.content)  # parse "1. xxx\n2. xxx" format

    return {
        "plan": plan,
        "current_step_index": 0,
        "completed_steps": [],
    }

入全景模式 出全屏模式

规划系统提示至关重要:

PLANNER_SYSTEM = """You are a task planning expert.
Rules:
1. Break the task into 3-7 independent steps
2. Each step must be concrete and actionable
3. Steps must have clear dependencies (later steps can use earlier results)
4. The final step should be "synthesize all information and deliver the answer"

Output format (only the step list, nothing else):
1. [step description]
2. [step description]
...
"""

入全景模式 出全屏模式

行节点(嵌入式React子代理)

def execute_node(state: PlanSolveState) -> dict:
    idx = state["current_step_index"]
    current_step = state["plan"][idx]

    # Build execution context (includes results from completed steps)
    system_prompt = EXECUTOR_SYSTEM.format(
        completed_steps=format_completed_steps(state["completed_steps"]),
        current_step=current_step,
    )

    # Use a ReAct sub-agent to execute a single step (may need tools)
    sub_agent = create_react_agent(model=llm, tools=[calculator, web_search])
    result = sub_agent.invoke(
        {"messages": [
            SystemMessage(content=system_prompt),
            HumanMessage(content=f"Execute this step: {current_step}"),
        ]},
        config={"recursion_limit": 8},
    )

    step_result = result["messages"][-1].content
    new_completed = state["completed_steps"] + [
        f"{current_step}{step_result[:100]}"
    ]

    return {
        "step_result": step_result,
        "completed_steps": new_completed,
        "current_step_index": idx + 1,
    }

入全景模式 出全屏模式

此间有要旨焉:执行节点内嵌一React次代理。计划求解与React非相斥——计划求解供全局结构,React则处理每步之工具调用。

路由函数

MAX_REPLAN = 2

def should_continue(state) -> Literal["execute", "replan", "finalize"]:
    idx = state["current_step_index"]
    total = len(state["plan"])

    if idx >= total:
        return "finalize"  # all steps complete

    # detect step failure
    result = state.get("step_result", "")
    failed = any(kw in result for kw in ["Calculation error", "Search failed", "Error"])

    if failed and state["replan_count"] < MAX_REPLAN:
        return "replan"  # failed, still have retry budget

    return "execute"  # keep going

入全屏模式 出全屏模式

构建图谱

from langgraph.graph import END, START, StateGraph

graph = StateGraph(PlanSolveState)

graph.add_node("plan", plan_node)
graph.add_node("execute", execute_node)
graph.add_node("replan", replan_node)
graph.add_node("finalize", finalize_node)

graph.add_edge(START, "plan")
graph.add_edge("plan", "execute")
graph.add_conditional_edges(
    "execute",
    should_continue,
    {"execute": "execute", "replan": "replan", "finalize": "finalize"},
)
graph.add_conditional_edges(
    "replan", after_replan,
    {"execute": "execute", "finalize": "finalize"},
)
graph.add_edge("finalize", END)

agent = graph.compile()

入全屏模式 出全屏模式

全码:agent-02-plan-and-solve/plan_and_solve_agent.py


实际执行:观其策之成

示例一:多国人口数据

任务:索中国、美利坚、印度之人口。计其总数,及中国之比。

策士之输出

1. Search "China population", "US population", "India population"
   to get the latest figures.
2. Record China, US, and India's population numbers.
3. Add China, US, and India's populations to get the three-country total.
4. Calculate China's population as a percentage of the three-country total.
5. Synthesize all information and deliver the final answer.

进入全屏模式 退出全屏模式

执行轨迹

[Step 1] web_search("China population") → 1.40489 billion
         web_search("US population")    → 341 million
         web_search("India population") → 1.451 billion

[Step 2] Record results (no tool call, model consolidates)
         → China 1.40489B, US 341M, India: no data available ← ⚠️

[Step 3] calculator("14048900000.0 + 3400000000.0")
         → 17448900000 ← ⚠️ India missing!

[Step 4] calculator("14.0489 / 17.4489 * 100")
         → 80.5145%

[Final answer] Three-country total: 1.74489B, China's share: 80.5145%

进入全屏模式 退出全屏模式

等待。何事发生?

首步已得印度之众(十四亿五千一百万),然第二步云“印度无数据”。第三步之算,惟加中、美而已。

此乃“计划求解”之常见陷阱也:信息在步骤间流转而失。

首步之果,所存于completed_steps然摘要被截断(仅百字)。要害之数或未能存续。第二步无工具调用——全赖模型自忆第一步之果于文脉。模型妄言“无数据可取”。

此非谬误,乃设计之固有代价:当信息链绵长,摘要式传讯必致信息之失。 末节之解.

示例二:依存链务(iPhone 人民币价格)

务:索求 iPhone 最新美元价,索求汇率,化算为人民币.

企划者生七步之策 — 而三步足矣(索求价格,索求率,计算)。此显企划者于简易之务,有过度企划之倾向.,析微行于独步之境。

第六步生奇器之败:

[Step 6] Need to round 8836.45
  → calculator("round(8836.45)")
  → Error: unsupported AST node: Call
  → calculator("round(8836.45, 0)")
  → Error: unsupported AST node: Call
  → Result: Sorry, need more steps to process this request.

入全景模式 出全景模式

吾之算器惟支算术——无函数呼(为防注入,故设)。模型试之round()再,皆败,遂弃,应若迷津。

然于第七步(终合之境),模型雅致地避之:

1299 USD × 6.8025 = 8836.45 CNY
Rounded to approximately 8836 CNY

入全景模式 出全屏模式

其以自然之语为之"取整",非假外物。器物之毁非终,模型之能可为之备。

第三演示:简易任务规划

任:计 2^10 + 3^5。

《策》成四步之策:

1. Calculate 2 to the power of 10
2. Calculate 3 to the power of 5
3. Add the results of steps 1 and 2
4. Synthesize all information and give the final answer

入全屏模式 出全屏模式

较之ReAct之法:但一calculator("2**10 + 3**5")之呼。毕矣

策而求解,于此实属“过犹不及”——化一步之算为四步。此乃吾辈需详论之要义之一


五要义

此演示既行,实工程中要者有五:

观其一:策者多务于策

简事也,大智书生(LLM)每微行皆作一节。是增行数,耗符文,致事迟缓。善策者之示,当明限:简事不过三节,唯依实需方可分。

其二:步骤间信息传递,需精心设计

每步之果,皆录为自然语言之要,存于completed_steps。若要过短,要旨数字遂遭割裂(Demo一之印度人口)。善策:当以结构化格式(如JSON或键值对)存步骤之果,非截断之散文.

其三:器用之失,非步骤之败。

器用不效,则模型可返其固有之识(Demo 2之取整是也)。器用失时,勿遽发Replan——当先令执行节点处置之。惟模型实不能成合理之果,方可发Replan。

第四发现:Replan乃双刃之剑

复算之术,虽增系统容错,亦生变数:新策或与旧策相悖,或遗必要之步骤。生产之策:限复算至二次。若犹不足,则和缓降级——告用户任务未竟。

第五之见:谋定解与应变非相对也

吾等之实装,每执行一步,皆内用一ReAct之次代理。Plan-and-Solve供"战略筹划",ReAct供"战术执行"。此层叠之设计,于真实代理工程中甚为常见,实乃LangGraph所筑之本也。


何时而择 ReAct,何时而择 Plan-and-Solve

此乃工程之要决也

Task analysis
│
├─ Fewer than 3 steps?
│   └─ Use ReAct (lightweight, fast)
│
├─ Strong dependencies between steps?
│   (later steps need precise results from earlier steps)
│   └─ Plan-and-Solve (explicit plan enforces dependency order)
│
├─ Clear task boundary, enumerable steps?
│   └─ Plan-and-Solve or even Workflow-Driven
│
├─ Open-ended task, fuzzy boundaries?
│   └─ ReAct (adapts to unknowns)
│
└─ Long-horizon planning (10+ steps)?
    └─ Consider multi-Agent architecture (later article)

入全景模式 退出全屏模式

现实案例

情境 推荐 缘由
搜寻事实并作答 ReAct 单步操作,无需规划
多源比较分析 规划后求解 数据采集有依序之需
自动生成代码与测试 谋定而后动 步骤明晰:书之而后行,行之而后正
开放性竞研 应势而为 搜索之向动态流转
数据处理之脉动 流程为纲 诸步固定
繁疴辨证 应行兼策 参合:先察策径,后动执行

疏失信息之困

试一之印度人口之失,有数工术之策:

策甲:存步果于形制之式

# Instead of natural language summaries:
completed_steps.append(f"Search China population → {step_result[:100]}")

# Use structured data:
step_data = {
    "step_index": idx,
    "description": current_step,
    "result": step_result,          # full result, no truncation
    "extracted_values": {},         # have the model extract key numbers
}

入全幅视界 出全幅视界

选项B:为收集数据设专有状态位

class PlanSolveState(TypedDict):
    # ... other fields ...
    collected_data: dict[str, Any]  # dedicated storage for gathered data

进入全屏模式 退出全屏模式

每执行一步,非惟写入completed_steps,亦将关键数据提取入collected_data。后继步骤直读此字典——不倚赖模型以"忆"散文.

选项C:令规划者明示数据流

命策士标注每步:

  • "输入:步骤X之数据何在"
  • "输出:当生何数据,存于何处"

此定谋层之数据流图,行前所立也.

三者渐增其繁复与坚牢。生产之际,当以任事配其繁。


面试备考:论计划与执行之别

面试之问:尔之代理者,行前有计乎?其理何如?

多数应者言ReAct——行时隐思,无显计。若尔已施Plan-and-Solve,此乃显著之别:

"吾辈因事异而制器不同。凡步数寡而界域朦胧之事,React之默运已足。若多步而相倚明矣之事——如多源较析——则用计而后解。"

具体而言:规划阶段以大语言模型为器,行全任务之分解,成步骤之列——此际无工具调用,唯思虑之纯。求解阶段则依序执行每步,内嵌之ReAct次代理,于每步之中掌工具调用之事。

此赐吾二利:行止既定,依存显明,察弊易焉。复有重定之制,以御瑕孽。

吾辈于生产之中,遇一实患:步骤间信息传递,须有章法。自然语言之摘要,失其要旨——遂改用结构化JSON记录步骤之果,使后序步骤不假依赖模型‘忆’前序之果。

此答示君,已越于例证之运行,实已遭遇生产之患,并深思其理。


简略之要

是此篇之要旨三事:

  1. 谋行并重,先谋后行:较之ReAct之贪欲策略,谋行并重先列全步,而后执行,使依存显明,行路可测。尤宜于结构井然之诸步任务。

  2. 信息传递,乃最大之患。:以自然语言之要旨传数据于诸步,则信息有失。生产系统当以结构之格式存关键之中间结果——勿恃模型以“忆”往昔之果。

  3. 计划与求解及React自然相合:计划与求解供全局之构;React则掌诸步中之工具调用。此层叠之设,于繁复之Agent系统中常见。


次之:代理系列文章第四篇——深入探究工具调用:工具乃代理之"手",然手之设计定代理所能为。吾将深究工具设计之理、参数校验、错误处理,及如何防工具成安全隐患


。 参考文献


欢迎光临吾之个人主页,以得更多有益之知识与有趣之产品