慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
四法以验生产AI之失
SapotaCorp · 2026-05-24 · via DEV Community

SapotaCorp

周五深夜,有创始人发来消息曰:"吾之代理已坏。客有怨言。吾之值班工程师不知所措。汝可助我否?"

此代理乃上周一启之客服之器。及至周五薄暮,公司之客服箱已盈满用户报称,此AI或答非所问,或应答无期,或竟尔超时。工师视之为巨弊。实则乃四弊叠压而成。

此乃诸生产代理团队必至之败局。症象累积,团队惶惑,遂试种种无方之策。此乃萨波塔所演之勘验次第,及四类最常见之败式,其致发布后之变故者十之八九.

勘验次第:首溯踪迹

未调试他物,先观其迹。若尔之代理于生产而无迹,此乃首患,纵遇急事亦当先解。取一失败之请,察其迹,观其时之所在,及其败之由。

吾所求于迹者:

  • 请实败于何处?一特定之工具呼?一特定之LLM步骤?一超时之事?
  • 其败状何若?HTTP之误耶?LLM之幻耶?输出之范式不协耶?输出之合于验而实谬耶?
  • 近何所变?较一败之请于一周前之善请,何异之有?

创者之例,迹显三异败式,现于同周。众视作一患,盖因客显之症同:曰"智机坏矣"。

败式一:外倚之劣

产中代理之败,最常者,乃外倚渐缓或失信。代理自无恙,唯周遭之世已变。

常見之弊:

  • 大語言模型供應商速率限制。OpenAI或Anthropic因吾人流量超過等級限制而開始限制速率。每請求現需重試三回方得成功,延遲增三倍。
  • 检索系統緩慢。吾人向量資料庫負載過重於啟動之初,p95查詢延遲自50毫秒增至800毫秒。
  • 外部API渐变。汝所唤之器(CRM API、计费系统、检索服务)悄然更新,致应答之形或时序有变。
  • 知识库增广。自启以来,汝之KB(知识库)增三倍,然检索之忆力渐衰,盖未调适于更宏大之文丛也。

诊之:察汝器之迟滞与谬误于曩周。若任器之p95迟滞倍于初启,或其谬率增逾一,斯为候也。

其治因所恃之异。限率:升汝级或行指数退避。取索迟缓:调索引或扩数据库。API渐移:更集成。KB滋长:复调分块与取索之度。

于创者之事,LLM之供者于曜日密更其模。新模解导引之示稍异,致使代理频回其思,而后定于答。平均之迭自2.3增至4.1。费与迟皆骤升。其解乃更严导引之示,增三例少射之例.

失状二:验证之闩未尝触发

逆败之象:验真之关本应格除谬误之输出,然不鸣者,盖门限逻辑有隙,或阈限失当也。

常弊之式:

  • 忠实阈限过卑。设为0.5,门关纵容多属臆造之应。当为0.85以上方合生产之需。
  • 模式校验允空值。输出模式需"answer"字段,然允其为空字符串。空应答者,用户得之,如"不知",而代理未觉其失。
  • 毒性滤除未载。滤除库本应导入,然重构移其导入,今默然无作。
  • PII隐匿失当,误于其域。隐匿用户输入,然于应答中泄PII之私.

诊断之道:察客所报恶应之样本,溯其应受制者。若此失状有验证之关,当验其是否果发。

创始人设诚信之阈为0.7,尚宽。吾等紧之,定为0.85,拒率自2%升为9%,而客诉误答者立减。其“拒”之应,易以诚言“吾无此知于库”,用户反右于误答。

失效式三:边例之费无度

生產查詢分發與測試分發異。特定查詢模式或遠昂於常,數者或主其費。

其式若此:少數用戶(常為1-5%)生大費(常為30-60%)。或由正當之繁詢,或由濫用,或因其入觸代理之退化途徑。

诊之:取上周每用户成本之数,降序排列。察前十用户。其发问是否常例?抑或一用户循环其集成为不良输入乎?或特定之问类(长文、输入错乱、多轮深涉罕题)耗尽预算乎?

其治之方各异:

  • 每用户每日速率限制与费用上限。硬性限制,防滥而不碍正用.
  • 输入长度上限。多数大语言模型费用随输入符号增。限用户输入于合理之极(如十万字符),并委婉请其简述以应长询。
  • 查询类型路由。若特定查询类型耗费甚巨,则当有可能时,导之至简易/廉宜之处理器。如"生成详尽报告"即属此类;宜导至异步批处理,而非同步对话。
  • 每请求迭代之限。防代理于单一请求上无限循环。吾辈默认以五至十次为限。

创始人之制,有二士频发长篇比物之请,日耗其费约四成。吾增一士一日之费限,且制输入之长。四十八时内,费减其三。二士未怨,盖皆试内之能,而限之宽,足敷常需。

失效之四:默然质迁

最难察之败:物无毁折,无谬误,迟滞尚可,耗损亦常。然应答日劣。客有诟病,众不能复现,诸表皆呈青色。

因由:

  • 旨趣之移。工技者更易其范本,似微而实易行。去一例,易一令,明一义,而大智之解异焉。
  • 模型提供者更新。如前所述,基础模型或可更易而君不知。君之特定应用场景,其质或升或降。
  • 语料漂移。君之知识库积聚内容,致检索受染。旧文当废而犹居高位,新文与旧文相冲突。
  • 评估集陈旧。尔之评估集乃六载前针对旧版产品所撰,未映今时用户所求。

诊断之法:以今时生产代理运行尔之评估流程,较之发布时之分数。若分数降,则质有迁流;若分数同而客有怨言,则评估集已陈腐。

其方:更易评集。取实产之问五十至百,为每问拟应,运评而调之。众队每季更易评集。速变之域,月更之。

为创始者所遇之变

周五夜四时之诊:

  1. 外部依存:察知大语言模型提供者之模型更新,收束路由提示,代理止于循环。迟滞与成本复归。
  2. 验证阈限:自0.7提至0.85,忠实度阈限收束,拒斥之应答现返诚实“吾不知”,非幻生妄语。
  3. 成本失控:增每用户日成本之限五元,输入长度限于一万字。成本降三成五。
  4. 质量渐变:初启时,试评,得分为0.84,后降为0.71。更新试评集,纳近产五十问,察得三类之问,使代理失之,遂增其类之文牍覆载。一周后,分数复归0.86。

顾客之诉,七十二时辰内止。团队之志,自"吾辈所建乃破器"转而为"吾辈所建乃需严谨操持,非所预者"。此第二之见,乃得良器之钥.

荐言:复演其案。

始创之众,未备"代理于产中已损"之策。惶然寻错,每步皆滞。事后,吾辈撰一页运行手册,列四失状,各状之诊,及最常之修。

既六周,复遇类此之变(工器API停用),当值之工师循手册而行,廿分钟内辨其由,施文载之方,一时而毕。无惶遽,无上闻,无值五夜召谘之役。

此乃产司代理之运于成熟之状也。非谓“无有乖谬”,实谓“乖谬既生,众有既知之程以索其本”。

若尔之代理发轫失宜

尔之众发人工智能之代理,初数旬之苦痛逾乎所期,其宜为之干预者,乃勘验之审计,非增益之开发也。多数发轫之问题,非代理代码之新谬,乃运营之隙,惟于生产之巨量方显。

萨波塔提供发售后一周之审,遍查踪迹、验证、依存及品质之迁,辨四失之何者致何症,并送修正及运行手册以应将来之变。吾等已为六七B2B SaaS之客,于其AI发后三月内行之此。

AI工程页相询。 乃述汝之智能所司,及所遇之失。初晤,多属诊察。