器物返诸秽物，则使役者循环无已。今呈三十行之卫策。

书： AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs
亦余所作： Thinking in Go（二书系列）— Complete Guide to Go Programming + Hexagonal Architecture in Go
吾之项目： Hermes IDE |GitHub — 一款供开发者使用，与Claude Code及其他AI编程工具协同之IDE
吾言： xgabriel.com | GitHub

器返畸形之JSON。模型睨之，决曰“吾再试之”，复唤同器以同辔。同秽复至。模型再试，复试。尔之账单渐增，用户睹旋钮，当值者将受警讯于费异。

此非模型之谬。模型之行，乃其所当为：遇暂性之败，则重试之。谬者，乃汝之工具与模型之间，无物能将“此物已残破”之状，化为模型可应之讯也。

非需更睿智之模型以正之。汝需于每工具之唤，周之三十行Python之码。

重试之环，耗尽账户

此乃一生产轨迹，吾所共事之团队近时所传，稍作隐匿其形：

turn 12: assistant → tool_call(search_orders, {"customer_id": "C-9921"})
turn 12: tool_result → "{\"orders\": [{\"id\": \"O-1\", \"total\""  // truncated, invalid JSON
turn 13: assistant → "Let me try that again."
turn 13: tool_call(search_orders, {"customer_id": "C-9921"})
turn 13: tool_result → "{\"orders\": [{\"id\": \"O-1\", \"total\""  // same truncation
turn 14: assistant → "Apologies, let me retry."
...
turn 28: tool_call(search_orders, {"customer_id": "C-9921"})  // 17 identical retries

凡十七回。每一回，皆全轮提示之往返，而文境日增。其器屡返截断之负载，盖因上游之门径有四千字节之应答限，未尝见载于文牍。此模无隙可乘。

所模型所需非更多计算，乃需人言之。应答之形非正，止尔尝试。或更佳者："应答有误，所截字段为"orders[0].total，此乃小问，合乎其制。彼讯息终未至。

彼乃守者。

工具之故障有三类

著警卫之文，必先有分类之法。工具之故障，非尽相似，模型之正道，系于所遇之类别。

类一：架构不谐 工具已返数据，然其不符于所承之约。字段类型有误，必要之钥阙如，有枚举值非代理所识。此乃前文所言之截断JSON之例，亦为“吾于上季新增状态码而遗忘更新代理”之例。

当机立断之道：止其重试。或询用户以明其意，或退而求他器，或坦然放弃。重试旧呼，终无裨益。

类二：残缺之数据。器所返者，虽合规范而未全。分页之中途而断。时限所返，惟其所有而已。外接之API限速，仅予四十七条中前三条。

当机立断之道：守既有之，或易策重试（页减其小，滤缩其窄）。旧调重弹，无益；新调或可期）。

类三：义理之秽。 工具所返者，数据有效，结构完整，然实谬。一索无果，盖模型以客户名应ID之域。一气象API所报者，摄氏度也，而代理所求乃华氏度。形虽善，而义不可解。

当事之正策：重思所召。异论异器，或升告于用户。懵懂重试，耗金无益。

此分类之要，非在理论。乃在汝返模型之错误消息之形。每类欲异导，而泛泛之"工具失灵"则失此信息.

三十行之卫

此乃其核。裹任何工具之唤，验于所期之Pydantic模式，分类其败，返结构之错误，使模型得实际思之。

from typing import Callable, Any
from pydantic import BaseModel, ValidationError
import json

class ToolError(BaseModel):
    error_class: str  # schema_mismatch | partial_data | semantic_garbage
    code: str         # invalid_status, truncated_response, empty_result, ...
    detail: str       # one human-readable line
    hint: str | None  # what the model should try next

def guarded_call(
    tool: Callable[..., str],
    schema: type[BaseModel],
    validate_semantics: Callable[[BaseModel], ToolError | None] = lambda _: None,
    **kwargs: Any,
) -> dict:
    try:
        raw = tool(**kwargs)
        parsed = json.loads(raw)
    except json.JSONDecodeError as e:
        return ToolError(
            error_class="schema_mismatch", code="invalid_json",
            detail=f"Tool output isn't valid JSON: {e.msg} at pos {e.pos}.",
            hint="Don't retry with the same args. The tool itself is broken.",
        ).model_dump()
    try:
        validated = schema.model_validate(parsed)
    except ValidationError as e:
        first = e.errors()[0]
        return ToolError(
            error_class="schema_mismatch", code="schema_violation",
            detail=f"Field `{'.'.join(map(str, first['loc']))}`: {first['msg']}.",
            hint="Don't retry with the same args. The contract is broken.",
        ).model_dump()
    semantic_err = validate_semantics(validated)
    return semantic_err.model_dump() if semantic_err else validated.model_dump()

不数导入，三十行。其形：

tool乃运行副作用者（如HTTP请求、数据库查询、调用外部程序）。
schema乃描述成功之貌的Pydantic模型。
validate_semantics 乃三等之选，可再行校验。其核验 schema 之不能表意（如应有所得而查询反呈空结果）。
返值或为已验之负载（典），或为 ToolError 典。

此裹器于 JSONDecodeError 与 ValidationError 之枝节间，摄一等之属。于三等之属，亦能摄之。validate_semantics之课二，偏数据，亦坠于validate_semantics，盖以其为质问，非形问也.

用之於实器

设尔之使有search_orders器。其式：

class Order(BaseModel):
    id: str
    total_cents: int
    status: str  # placed, shipped, delivered, cancelled

class SearchOrdersResult(BaseModel):
    orders: list[Order]
    page: int
    has_more: bool

偏数据与空果之异象，其义之验者：

def check_orders(result: SearchOrdersResult) -> ToolError | None:
    if result.has_more and result.page == 1 and len(result.orders) == 0:
        return ToolError(
            error_class="semantic_garbage", code="empty_first_page",
            detail="has_more=true but page 1 returned 0 orders.",
            hint="The query probably matched a filter index but no rows. "
                 "Try a broader date range or check the customer_id format.",
        )
    if result.has_more:
        return ToolError(
            error_class="partial_data", code="more_pages_available",
            detail=f"Page {result.page} returned {len(result.orders)} orders, more exist.",
            hint=f"Call again with page={result.page + 1} to continue.",
        )
    return None

而使者呼曰：

result = guarded_call(
    tool=raw_search_orders_api,
    schema=SearchOrdersResult,
    validate_semantics=check_orders,
    customer_id="C-9921",
    page=1,
)

今器所呈于模者，其一也：

已验之负载（顺途）。
{"error_class": "schema_mismatch", "code": "invalid_json", ...}模知当止。
{"error_class": "partial_data", "code": "more_pages_available", "hint": "Call again with page=2"}模知其事当何为。
{"error_class": "semantic_garbage", "code": "empty_first_page", "hint": "Try broader date range"}。此模有确凿之继步。

。无堆栈之迹。无原貌之异。无"内务服务器谬误"而无他因。

。何故此模处"谬误：无效状态"较之堆栈之迹为善？

栈迹者，调试之文也，供人于终端观之。模型视之为晦涩之文，索求"error"或"exception"等词，若不得，则回其旧言："工器失灵，再试之。"

结构之误，具code、detail、hint者，于模型观之，若API之文。模型于训练中见OpenAPI之误应千余，知之甚明。invalid_status, expected one of [placed, shipped, delivered, cancelled], got "shipping"之义何在。勿传"shipping"，但传所列四值之一。

不良之筛后，较二载荷：

# Stack trace form
"Traceback (most recent call last):\n  File ...\n  TypeError: ..."

# Structured form
{
  "error_class": "schema_mismatch",
  "code": "invalid_status",
  "detail": "Field `filters.status`: value 'shipping' is not one of "
            "['placed', 'shipped', 'delivered', 'cancelled'].",
  "hint": "Use 'shipped' (past tense) if you want orders that left the warehouse."
}

首者或令三试同参，而后模型弃之。次者得一次修正之呼。

配此卫以重试之额。

守卫止模型于形误之循环。然不阻模型于瞬误之循环，此乃可重试者，于此需设预算。

于调度者中，记每器重试之数。若同(tool_name, args_hash)对于一回合中调用逾N次，则拒之，而显之。ToolError(error_class="schema_mismatch", code="retry_budget_exceeded")。此限最坏情形至 N 轮往返，纵使验者失察某类故障。

。预算三之数，于众工已属宽宏。唯读工可稍高；具副作用者，当为一，唯遇分类为暂时性之误时，方显式重试。

此与代理层级之任何循环检测器相配：预算为每工具每回合，循环检测器则跨回合。二者所察不同。

其所隐：严苛验证器拒合理之边缘情形。

一 Pydantic 模式之善，唯在所撰之架构。严苛之验证，虽感安泰，然却拒合理之奇状：无姓之客，因退货而无一货之单，未睹之币之支付。

部署守卫于生产之前，当于离线处，以一星期之真实流通过之。悉记之。ValidationError 並手校之。其式一也：九十之者，皆守正者所察之谬；十之者，乃真之奇境，守正者过严焉。

其十之者，宽其式（或易一域自 str 为 str | None，或展枚举为 Literal | str 联）。其九十之者，勿宽，盖此谬者，乃汝设警以察之故也。

他途也，不校验而运严核，则使者生新弊：凡试皆报误，虽真器实佳。模型失其器信，始频求用户之证于每唤。此 UX 之害，甚于初弊。

汝所睹使者堕最恶之环者何？及其止之者何？

若此有益乎

此模式显于可靠性与恢复篇之AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs，兼有环检之式，可捕同虫之歧转。若尔构物运行逾数回合，则败类之分类与卫调之裹，可免未告之账单。

推薦訂閱源

DEV Community