慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
器物返诸秽物,则使役者循环无已。今呈三十行之卫策。
Gabriel Anha · 2026-05-24 · via DEV Community

器返畸形之JSON。模型睨之,决曰“吾再试之”,复唤同器以同辔。同秽复至。模型再试,复试。尔之账单渐增,用户睹旋钮,当值者将受警讯于费异。

此非模型之谬。模型之行,乃其所当为:遇暂性之败,则重试之。谬者,乃汝之工具与模型之间,无物能将“此物已残破”之状,化为模型可应之讯也。

非需更睿智之模型以正之。汝需于每工具之唤,周之三十行Python之码。

重试之环,耗尽账户

此乃一生产轨迹,吾所共事之团队近时所传,稍作隐匿其形:

turn 12: assistant → tool_call(search_orders, {"customer_id": "C-9921"})
turn 12: tool_result → "{\"orders\": [{\"id\": \"O-1\", \"total\""  // truncated, invalid JSON
turn 13: assistant → "Let me try that again."
turn 13: tool_call(search_orders, {"customer_id": "C-9921"})
turn 13: tool_result → "{\"orders\": [{\"id\": \"O-1\", \"total\""  // same truncation
turn 14: assistant → "Apologies, let me retry."
...
turn 28: tool_call(search_orders, {"customer_id": "C-9921"})  // 17 identical retries

入全景模式 出全景模式

凡十七回。每一回,皆全轮提示之往返,而文境日增。其器屡返截断之负载,盖因上游之门径有四千字节之应答限,未尝见载于文牍。此模无隙可乘。

所模型所需非更多计算,乃需人言之。应答之形非正,止尔尝试。或更佳者:"应答有误,所截字段为"orders[0].total,此乃小问,合乎其制。彼讯息终未至。

彼乃守者。

工具之故障有三类

著警卫之文,必先有分类之法。工具之故障,非尽相似,模型之正道,系于所遇之类别。

类一:架构不谐 工具已返数据,然其不符于所承之约。字段类型有误,必要之钥阙如,有枚举值非代理所识。此乃前文所言之截断JSON之例,亦为“吾于上季新增状态码而遗忘更新代理”之例。

当机立断之道:止其重试。或询用户以明其意,或退而求他器,或坦然放弃。重试旧呼,终无裨益。

类二:残缺之数据。器所返者,虽合规范而未全。分页之中途而断。时限所返,惟其所有而已。外接之API限速,仅予四十七条中前三条。

当机立断之道:守既有之,或易策重试(页减其小,滤缩其窄)。旧调重弹,无益;新调或可期)。

类三:义理之秽。 工具所返者,数据有效,结构完整,然实谬。一索无果,盖模型以客户名应ID之域。一气象API所报者,摄氏度也,而代理所求乃华氏度。形虽善,而义不可解。

当事之正策:重思所召。异论异器,或升告于用户。懵懂重试,耗金无益。

此分类之要,非在理论。乃在汝返模型之错误消息之形。每类欲异导,而泛泛之"工具失灵"则失此信息.

三十行之卫

此乃其核。裹任何工具之唤,验于所期之Pydantic模式,分类其败,返结构之错误,使模型得实际思之。

from typing import Callable, Any
from pydantic import BaseModel, ValidationError
import json

class ToolError(BaseModel):
    error_class: str  # schema_mismatch | partial_data | semantic_garbage
    code: str         # invalid_status, truncated_response, empty_result, ...
    detail: str       # one human-readable line
    hint: str | None  # what the model should try next

def guarded_call(
    tool: Callable[..., str],
    schema: type[BaseModel],
    validate_semantics: Callable[[BaseModel], ToolError | None] = lambda _: None,
    **kwargs: Any,
) -> dict:
    try:
        raw = tool(**kwargs)
        parsed = json.loads(raw)
    except json.JSONDecodeError as e:
        return ToolError(
            error_class="schema_mismatch", code="invalid_json",
            detail=f"Tool output isn't valid JSON: {e.msg} at pos {e.pos}.",
            hint="Don't retry with the same args. The tool itself is broken.",
        ).model_dump()
    try:
        validated = schema.model_validate(parsed)
    except ValidationError as e:
        first = e.errors()[0]
        return ToolError(
            error_class="schema_mismatch", code="schema_violation",
            detail=f"Field `{'.'.join(map(str, first['loc']))}`: {first['msg']}.",
            hint="Don't retry with the same args. The contract is broken.",
        ).model_dump()
    semantic_err = validate_semantics(validated)
    return semantic_err.model_dump() if semantic_err else validated.model_dump()

入全景模式 出全景模式

不数导入,三十行。其形:

  • tool乃运行副作用者(如HTTP请求、数据库查询、调用外部程序)。
  • schema乃描述成功之貌的Pydantic模型。
  • validate_semantics 乃三等之选,可再行校验。其核验 schema 之不能表意(如应有所得而查询反呈空结果)。
  • 返值或为已验之负载(典),或为 ToolError 典。

此裹器于 JSONDecodeErrorValidationError 之枝节间,摄一等之属。于三等之属,亦能摄之。validate_semantics之课二,偏数据,亦坠于validate_semantics,盖以其为质问,非形问也.

用之於实器

设尔之使有search_orders器。其式:

class Order(BaseModel):
    id: str
    total_cents: int
    status: str  # placed, shipped, delivered, cancelled

class SearchOrdersResult(BaseModel):
    orders: list[Order]
    page: int
    has_more: bool

入全屏模式 出全屏模式

偏数据与空果之异象,其义之验者:

def check_orders(result: SearchOrdersResult) -> ToolError | None:
    if result.has_more and result.page == 1 and len(result.orders) == 0:
        return ToolError(
            error_class="semantic_garbage", code="empty_first_page",
            detail="has_more=true but page 1 returned 0 orders.",
            hint="The query probably matched a filter index but no rows. "
                 "Try a broader date range or check the customer_id format.",
        )
    if result.has_more:
        return ToolError(
            error_class="partial_data", code="more_pages_available",
            detail=f"Page {result.page} returned {len(result.orders)} orders, more exist.",
            hint=f"Call again with page={result.page + 1} to continue.",
        )
    return None

入全屏模式 出全屏模式

而使者呼曰:

result = guarded_call(
    tool=raw_search_orders_api,
    schema=SearchOrdersResult,
    validate_semantics=check_orders,
    customer_id="C-9921",
    page=1,
)

入全屏模式 出全屏模式

今器所呈于模者,其一也:

  1. 已验之负载(顺途)。
  2. {"error_class": "schema_mismatch", "code": "invalid_json", ...}模知当止。
  3. {"error_class": "partial_data", "code": "more_pages_available", "hint": "Call again with page=2"}模知其事当何为。
  4. {"error_class": "semantic_garbage", "code": "empty_first_page", "hint": "Try broader date range"}。此模有确凿之继步。

。无堆栈之迹。无原貌之异。无"内务服务器谬误"而无他因。

。何故此模处"谬误:无效状态"较之堆栈之迹为善?

栈迹者,调试之文也,供人于终端观之。模型视之为晦涩之文,索求"error"或"exception"等词,若不得,则回其旧言:"工器失灵,再试之。"

结构之误,具codedetailhint者,于模型观之,若API之文。模型于训练中见OpenAPI之误应千余,知之甚明。invalid_status, expected one of [placed, shipped, delivered, cancelled], got "shipping"之义何在。勿传"shipping",但传所列四值之一。

不良之筛后,较二载荷:

# Stack trace form
"Traceback (most recent call last):\n  File ...\n  TypeError: ..."

# Structured form
{
  "error_class": "schema_mismatch",
  "code": "invalid_status",
  "detail": "Field `filters.status`: value 'shipping' is not one of "
            "['placed', 'shipped', 'delivered', 'cancelled'].",
  "hint": "Use 'shipped' (past tense) if you want orders that left the warehouse."
}

全屏模式入 全屏模式出

首者或令三试同参,而后模型弃之。次者得一次修正之呼。

配此卫以重试之额。

守卫止模型于形误之循环。然不阻模型于瞬误之循环,此乃可重试者,于此需设预算。

于调度者中,记每器重试之数。若同(tool_name, args_hash)对于一回合中调用逾N次,则拒之,而显之。ToolError(error_class="schema_mismatch", code="retry_budget_exceeded")。此限最坏情形至 N 轮往返,纵使验者失察某类故障。

。预算三之数,于众工已属宽宏。唯读工可稍高;具副作用者,当为一,唯遇分类为暂时性之误时,方显式重试。

此与代理层级之任何循环检测器相配:预算为每工具每回合,循环检测器则跨回合。二者所察不同。

其所隐:严苛验证器拒合理之边缘情形。

一 Pydantic 模式之善,唯在所撰之架构。严苛之验证,虽感安泰,然却拒合理之奇状:无姓之客,因退货而无一货之单,未睹之币之支付。

部署守卫于生产之前,当于离线处,以一星期之真实流通过之。悉记之。ValidationError 並手校之。其式一也:九十之者,皆守正者所察之谬;十之者,乃真之奇境,守正者过严焉。

其十之者,宽其式(或易一域自 strstr | None,或展枚举为 Literal | str 联)。其九十之者,勿宽,盖此谬者,乃汝设警以察之故也。

他途也,不校验而运严核,则使者生新弊:凡试皆报误,虽真器实佳。模型失其器信,始频求用户之证于每唤。此 UX 之害,甚于初弊。

汝所睹使者堕最恶之环者何?及其止之者何?


若此有益乎

此模式显于可靠性与恢复篇之AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs,兼有环检之式,可捕同虫之歧转。若尔构物运行逾数回合,则败类之分类与卫调之裹,可免未告之账单。

AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs