惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
L
LINUX DO - 热门话题
月光博客
月光博客
B
Blog
博客园 - 叶小钗
美团技术团队
D
Docker
A
About on SuperTechFans
Stack Overflow Blog
Stack Overflow Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
WordPress大学
WordPress大学
P
Proofpoint News Feed
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Y
Y Combinator Blog
V
V2EX
Apple Machine Learning Research
Apple Machine Learning Research
博客园 - 三生石上(FineUI控件)
The Register - Security
The Register - Security
博客园_首页
The Cloudflare Blog
I
InfoQ
T
Tailwind CSS Blog
MongoDB | Blog
MongoDB | Blog
Engineering at Meta
Engineering at Meta
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Microsoft Azure Blog
Microsoft Azure Blog
有赞技术团队
有赞技术团队
C
CERT Recently Published Vulnerability Notes
AWS News Blog
AWS News Blog
Spread Privacy
Spread Privacy
V
Visual Studio Blog
博客园 - Franky
Cloudbric
Cloudbric
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
N
News and Events Feed by Topic
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
Webroot Blog
Webroot Blog
博客园 - 【当耐特】
TaoSecurity Blog
TaoSecurity Blog
B
Blog RSS Feed
N
News | PayPal Newsroom
人人都是产品经理
人人都是产品经理
H
Heimdal Security Blog
L
LangChain Blog
PCI Perspectives
PCI Perspectives
Jina AI
Jina AI
Google DeepMind News
Google DeepMind News
Schneier on Security
Schneier on Security

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Gates Earned From Failure: a cost test for agent guardrails
Vasyl Tretiakov · 2026-06-15 · via DEV Community

Every guardrail in my agent-built project was earned from a real failure, not designed up front. A cost test for when to build one, when to wait, and when to retire it.

On a Wednesday in late May I caught a bug by reading. The project's glossary — the
canonical list of the domain terms my coding agent is required to use — had drifted
from the domain model I actually carried in my head. Nothing flagged it. No test
failed, no check fired, no compiler complained. I noticed because I happened to be
reading the file and the words were wrong.

What I typed next is the whole argument in one line. Not "fix the glossary," not
"I'll be more careful," but a question aimed at the toolchain instead of the error:

the GLOSSARY already drifted away from my domain model vision and I would like
to prevent this in future refactors

That reframes the task. The drift stops being an error to correct once
and becomes a signal about the toolchain: if I caught a class of drift by
reading, an enforcement axis is missing.
The fix isn't to read harder next time.
Reading harder is just minting another rule in your head, another wish. The fix is
to add the check that couples the two things that drifted, and then never spend
that attention again.

I did not sit down and design a governance system, and that's the most transferable
thing here. Every check in this project — a couple dozen now, wired as pre-commit
hooks — was born from a particular drift I'd already been bitten by. The ordering is
the point. The failure came first; the gate came second, shaped to the exact failure.

Renames stopped being discipline

The clearest place this showed up was renaming. (I told the measurement side of
this story in a companion piece, Rails, Not Rules:
the first time I scripted a terminology check it found 737 violations in code I'd
thought I was keeping clean. This is the other half — not the count, but the habit
that grew from it.) Early on, retiring a domain term meant what it means
everywhere: search, replace, move on. Then a retired term came back. Not because
anyone re-typed it, but because a rename had swept the code and missed the surfaces
a compiler never reads — a spec document, a generated config, a comment that
mattered. Weeks later the old word resurfaced through a path I hadn't thought to
look at.

The detail that stung was where the residue hid. In one rename, the leftover
instances of the old term weren't in some forgotten corner of the codebase. They
were in the specification documents, the very files being rewritten to drive the
rename in the first place. The agent, under my direction, was editing the spec to
declare the new vocabulary while leaving the old vocabulary scattered through the
same spec. The document asserting the rename was itself out of compliance with it.
You cannot out-discipline that, and watching the diff doesn't save you either: no
amount of review reliably catches the contradiction being introduced in the very
pass that's concentrating on the change. Only a check that reads the document back
to you after the fact will, and that's what eventually went in.

It helps to be clear about why renames recur at all, because it isn't sloppiness.
The glossary is the written home of what Domain-Driven Design (DDD) calls the
ubiquitous language (UL): the single vocabulary that runs from a domain expert's
sentence to a class name. Eric Evans is emphatic that this language is not written
once and frozen; it is continuously distilled, sharpened every time the team's
understanding improves. Each sharpening is a rename. So the renames are the system
working as designed, not a mess to stamp out — which is exactly why the residue
problem is permanent and worth a gate rather than a resolution to be tidier.

After that, a rename was never just "use the new word now." It became three things
that ship together: rename the term, add a glossary entry recording the old word as
retired, and add a check keyed to that retired word so it can't silently return.
The rename and the rail land in the same commit. Skip the rail and you're trusting
that every future sweep will be perfect, which is the assumption that just failed.

So far this is a tidy story: let each gate be earned by a real, observed drift,
and your instruction file never bloats into the 200-line document nobody, human or
model, can hold in their head. A rule you add from imagination is a guess about a
failure that may never come. A rule you add from a failure you actually hit is paid
for. It has a body behind it. The failure log is the design document.

I believed that cleanly for about a month. Then I had to argue with my own agent
about it, and the rule came out more interesting than it went in.

The exceptions that earn a gate early

The trouble with "wait for the drift" is that some drifts you cannot afford to
observe even once. If the first occurrence is a deleted production table, or
customer data in a log line, or a fabricated citation in something already
published, "let it go wrong cheaply once" is a contradiction. There is no cheap
once.

I put this to the agent as a proposed amendment: build a gate proactively when the
failure mode is already well-attested and the check is cheap and
low-false-positive, or when the first occurrence is expensive or irreversible. Stay
reactive when the rule is judgment-heavy, the convention is still in flux, or the
drift is cheap to catch and fix once. The reasoning is mundane risk math, not
ideology: you pre-pay for a gate exactly when the expected cost of waiting exceeds
the standing cost of the gate.

This repo runs on that amendment, and I can point at two receipts. The writing
project, the one these essays live in, has a check that blocks any cited URL not
logged in an evidence file. I built it before it had earned itself in the usual
reactive way, because the failure it guards is the expensive kind and it had
already happened once: a fabricated citation reached a draft. A published
fabrication doesn't get a cheap first occurrence. So the gate went in proactively,
licensed by the cost test: is the failure high enough cost, and is the check free
of false positives?

The second receipt is the other half of the amendment, the well-attested-and-cheap
case rather than the irreversible one. The same writing project runs a prose check
on these very drafts, flagging the small set of mechanical tells that mark
machine-generated text. That failure mode wasn't hypothetical and it wasn't
expensive-once; it was a known, recurring, deterministic pattern I'd have to
correct on every essay forever. Well-attested, cheap to detect, low false-positive:
that's the second carve-out, and it's the reason this paragraph can't lean on more
than two em-dashes without the gate stopping the commit. The check is the essay's
own argument applied to the essay. The rule "gates are earned" survives both times,
with named carve-outs for failures you can't afford to rehearse and failures you've
already seen enough times to name precisely.

When a rule won't gate, find its shadow

Between "earn it from a failure" and "stay reactive" sits the move that does most
of the real work, and it's the one I'd most want a reader to steal. A lot of rules
look un-gateable because the thing they protect is a meaning, and a script can't
read meaning. "Don't restate the protocol prose in two places" isn't grep-able.
Neither is "don't leave a dangling reference." The reflex is to give up and write
the rule into an instructions file as a wish. The better move is to stop checking
the rule and check a structural proxy for it: a syntactic shadow the real rule
casts, one a script can see and that almost never flags an innocent.

It works more often than it has any right to. "Don't duplicate the protocol in
prose" became "no narrative text in column zero of these files," because the
protocol lived in indented blocks and unindented prose was the tell. "No dangling
references" became "every TODO( carries a live task slug." The em-dash rule
guarding these very drafts is the same trick: I can't gate "don't write like a
language model," but I can gate "no paragraph carries more than two em-dashes,"
which is a measurable shadow of the thing I actually want. Each time, a rule I'd
have sworn was judgment-only turned out to cast a shape a grep could catch.

The proxy is never the rule, though. A structural shadow is an approximation by
construction, so every proxy gate ships with a built-in gap between what it checks
and what you wish it proved. That gap is the next failure.

The failure that hides inside a green checkmark

Here is the part I got wrong, and the agent caught.

I had been treating "cheap to write" as most of what makes a gate worth building.
The agent pushed back on two fronts. First, cheap has to mean cheap to maintain,
not cheap to write. A gate is a standing liability, not a one-time cost. Every
legitimate evolution of the convention now has to route through the check, and you
pay an update tax and a false-positive-triage tax for the life of the gate. The
authoring cost is the small number.

The second point is the one that changed how I think. The agent put it more
sharply than I had:

Once a green gate exists, people stop eyeballing the thing the gate appears to
cover. So a cheap-but-approximate gate over a real invariant can be worse than
no gate — it converts active vigilance into misplaced trust.

The dominant hidden failure of a cheap gate isn't noise. It's false confidence. A
cheap check usually only approximates the invariant you care about. A text
search proves a token is
absent; it does not prove a concept was actually renamed everywhere it lives. A
fast compile passes against a stale cache. A sub-agent reports "tests passed" and
you don't re-run them. In every case the green checkmark covers less than it
appears to.

An outside read lands in the same place. Birgitta Böckeler, whose harness
vocabulary I lean on below, notes that test feedback is weaker than it looks once
"the agent also generated the tests." A verifier the generator authored is not
independent of the thing it checks, so a passing suite certifies less than it
seems to — the same gap, one layer up.

The rename gate from earlier is a perfect specimen — a structural proxy of exactly
the kind I just praised, which makes it the more humbling, because it's one I'm
proud of. It greps for the retired word and passes when the word is gone. But "the
old word is absent" and "the concept was correctly migrated" are not the same claim. You can delete every instance of the old term and still
have left the idea it named half-translated, split across two new words that
should have been one, or attached to the wrong entity. The gate is green and the
migration is wrong, and now nobody's reading the diff with the old suspicion,
because the check says it's handled. The check is real and worth having. It just
proves a narrower thing than its green checkmark advertises, and the discipline is
to keep knowing the difference instead of letting the green absolve you.

That's the trap in one move. You were watching the thing carefully when nothing
claimed to watch it for you. Now something claims to, so you look away, and the gap
between what the check proves and what it looks like it proves is exactly where
the next bug lives. When those two diverge and the invariant matters, that argues
against the cheap gate, not for it.

This isn't an argument against gates. It's an argument for knowing which of your
green checkmarks are load-bearing and which are decorative.

It's a ladder, not a switch

The other thing I had been collapsing was the choice itself. I'd been treating it
as binary: gate the thing, or stay reactive and fix it by hand when it breaks.
There are at least four rungs, and most failures want one of the middle two.

The distinction isn't mine — manufacturing got there decades ago. Shigeo Shingo's
poka-yoke, the mistake-proofing of the
Toyota Production System, already splits a control that makes the error
impossible from a warning that only signals it. The two middle rungs below are
those two ideas wearing software clothes. Birgitta Böckeler's
harness engineering
gives the orthogonal axis: guides that steer the agent before it acts, sensors
that observe after. A blocking gate is a sensor with teeth; a default path is a
guide that leaves nothing to sense.

The strongest rung isn't a gate at all. It's the default path — Shingo's
control type: make the wrong artifact hard to author in the first place. A
template, a generator, a structured section that only has slots for the right
shape. There's nothing to false-positive on, because you're not checking after the
fact, you're removing the way to get it wrong. When a failure is common but a
precise gate would be noisy, this beats the gate.

Below the blocking gate sits the tripwire — Shingo's warning type: warn without
blocking, or ask for confirmation before an irreversible step. This is the right
reach for failures that are catastrophic on first occurrence but genuinely hard to
gate cleanly — the data deletion, the leak, the fabrication. You don't need a perfect detector. A loud "are
you sure, here's what you're about to overwrite" buys most of the protection at a
fraction of the false-positive cost. The mistake is letting "we can't build a clean
gate for this" collapse into "so we stay fully reactive" on a failure you can't
take twice.

The rung that doesn't work is the one that looks like discipline: a script everyone
is supposed to run but nothing enforces. It's neither a gate nor a default path,
so under any real deadline it just gets skipped. A rule that lives only in prose is
a preference, however firmly phrased.

So the ladder, top to bottom: blocking gate, default path, tripwire, stay reactive.
"Should I gate this" was always the wrong question. The question is which rung the
failure earns.

The cost test cuts both ways

Because a gate is a standing liability, "earned" is a real filter, and a filter
that only ever says yes isn't filtering. The same cost test that licenses a
proactive gate also tells you when to leave something ungated, and I find I reach
for it in that direction about as often.

A concrete one: I recently changed a documentation convention across the project. A
pure convention change, no behavior, no domain term retired. The reflex in a repo
that leans this hard on enforcement is to add a check for the new convention. What
I wrote instead was an instruction to not:

Don't add a gate or script — this is a convention change only; the enforcement
hook stays deferred per the cost test.

The convention isn't load-bearing, a check on it would generate churn every time
the convention itself evolves, and the failure if someone gets it wrong is
cosmetic. No gate. Not yet, maybe not ever.

The same logic runs in reverse, against gates that already exist. A check family
only grows: one iteration of mine added six at once, and nothing in the workflow
ever pushes the other way. The tidy instinct is a recurring "prune the gates"
ritual at every close. That instinct is right about the pressure and wrong about
the mechanism, for two reasons that separate a gate from a stale paragraph of
prose. A doc paragraph taxes every session, so doc-bloat is pure, constant rent; a
redundant gate taxes only when it runs, so its bloat is mostly latent. And a stale
doc is just noise, while a gate is a load-bearing invariant whose removal risks
silently dropping a correctness check no one notices is gone. So pruning earns more
care and less frequency than tidying prose, not the same recurring slot. A full
portfolio review every close would usually find nothing, and a ritual that usually
finds nothing is the skipped-discipline trap again, dressed as housekeeping.

Two corollaries fall out. "Merge these two similar gates" defaults to no: two
precise checks beat one fuzzy merged one, and a merge is a design act with its own
failure mode, not janitorial cleanup. And the counter-pressure that is warranted
should itself be mechanized rather than left to willpower — a redundancy meta-check
that reads the coupling each gate already declares in its header and flags real
overlap, fired on a threshold or a phase boundary, never once per commit. Even
pruning gates wants a rung, not a resolution to be tidier.

A good gate even pays a rent rebate. Every rule you load into the agent's
always-present instructions is standing context it has to carry, and that context
isn't free. A deterministic check lets you delete the paragraph of prose that
used to beg the model to remember a rule and replace it with a script plus a
reference read only when it fires. The gate does the remembering, so the context
window doesn't have to — fewer gates can mean a heavier prompt, not a lighter one.

The rule about gates is itself a gate

The recursion is the part I like best: the system diagnosed the bias on itself.

Everything in this project leans toward enforcement over discipline. The agent
named the hazard better than I had:

the whole check-* script family leans hard toward enforce-over-discipline, so
a fresh session inherits a "when in doubt, add a check" texture with no stated
boundary. That absence is a latent gap.

A bias toward gating with nothing written down about when not to. The
honest response, by this essay's own logic, is not to immediately write the
boundary rule into the always-loaded instructions. That would be exactly the
proactive over-gating the rule warns against. The boundary rule hasn't failed yet.

So we left it ungated, with a concrete trigger: the first time any session ships a
gate that turns out high-false-positive or merely approximate, that's the
observed failure, and the boundary rule gets promoted then — and into a
read-on-demand reference, not the always-loaded file, because it's judgment-grade
material rather than a per-session directive. The meta-rule about earning gates is
itself being made to earn its place. The failure log is the design document, all
the way up.

Honest limitations

This worked for one person, on one roughly 150,000-line codebase, where I own the
glossary and the domain model is a single coherent thing in a single head. "Earned
from failure" is cheap when the failure is yours and you hit it the same day you
build the gate. At team scale the picture is harder: the standing context cost
compounds, the false-positive triage falls on people who didn't feel the original
pain, and "let it go wrong cheaply once" is a different proposition when the once is
someone else's afternoon. I don't have evidence for the team case. I have evidence
for the solo case and a suspicion the principles survive the move, which is not the
same thing.

One tempting claim I'll resist. I had a hunch, watching the sessions, that the agent
resists proposing gates proactively — that it fixes the immediate issue and only
designs the enforcement when pushed. There's a clean-looking receipt for it. Asked
to prevent a recurrence after I'd approved a fix, the agent replied:

Approved. Let me make the fix first, then design the enforcement check.

Fix first, gate second, only once prompted. But when I went looking for the
pattern, the support was mixed. In the flow of a task it does tend to fix first and
gate when asked. Hand it the same question as a matter of policy, though, and it
argues for more proaction than I'd proposed, not less. So I won't dress that up as
a clean behavioral finding. It's a real tension I don't yet understand well enough
to assert.

Why this generalizes

None of this is specific to my domain. Any agent operating on a governed system —
a customer-service platform, a billing pipeline, anything where words have to mean
the same thing across many surfaces — accumulates the same class of drift, and the
same four rungs apply.

Picture the Contact Center case concretely, since it's the domain I know best. A
single concept gets named in a routing rule, a reporting dashboard, an agent-facing
script, and the schema of the system that ties them together. Rename it in three of those
four and the fourth keeps quietly emitting the old word, so the dashboard and the
router disagree about what they're counting, and nothing crashes. That's the drift,
and it's invisible to every test that checks behavior rather than vocabulary. The
gate that couples the surfaces is the only thing that catches a disagreement no
single component is wrong about. That coupling pattern, applied in both directions,
is the subject of a companion essay, Couple Both Ways.
An agent makes this worse, not better, because it will cheerfully and fluently use
whichever term it last saw, in whichever surface it's editing.

The transferable move is to stop treating your instruction file as the place where
correctness lives. Prose is where preferences live. Correctness lives in the gates,
default paths, and tripwires you build, each one shaped to a failure specific
enough to point at. You don't govern an agent by
predicting how it'll go wrong. You let it go wrong cheaply where you can, and you
convert each real miss into the cheapest rung that makes it impossible to miss that
way again.


This essay was written by directing a coding agent over the project it describes;
I direct and judge, the agent drafts and argues back. The argument-back, in this
case, is most of beats four and five.

I build governed agent systems at the intersection of Contact Center software and
AI. If that's a problem you're chewing on, I'm reachable on LinkedIn.


References


Published at vasyltretiakov.dev.