惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LangChain Blog
Security Latest
Security Latest
P
Proofpoint News Feed
GbyAI
GbyAI
PCI Perspectives
PCI Perspectives
博客园 - Franky
N
Netflix TechBlog - Medium
博客园_首页
WordPress大学
WordPress大学
K
Kaspersky official blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Vercel News
Vercel News
T
Threatpost
The Hacker News
The Hacker News
H
Help Net Security
S
Securelist
Recent Announcements
Recent Announcements
腾讯CDC
T
Tailwind CSS Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Engineering at Meta
Engineering at Meta
C
Cisco Blogs
V
V2EX
C
Check Point Blog
S
Schneier on Security
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
B
Blog RSS Feed
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Jina AI
Jina AI
M
MIT News - Artificial intelligence
T
Threat Research - Cisco Blogs
博客园 - 叶小钗
A
Arctic Wolf
AWS News Blog
AWS News Blog
Latest news
Latest news
Martin Fowler
Martin Fowler
Recorded Future
Recorded Future
Last Week in AI
Last Week in AI
The GitHub Blog
The GitHub Blog
小众软件
小众软件
B
Blog
aimingoo的专栏
aimingoo的专栏
C
Cyber Attacks, Cyber Crime and Cyber Security
V
Visual Studio Blog
P
Palo Alto Networks Blog
Spread Privacy
Spread Privacy

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything BFF模式详解:构建前后端协同的中间层 I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Checkbox theater: how I stopped trusting my AI agent to run the checks
John Rojas · 2026-05-24 · via DEV Community

For context: in the previous piece, I worked through a five-dimension review framework for documentation, covering clarity, readability, style, completeness, and technical accuracy. Those dimensions are now part of how our team's AI agent reviews PRs. It runs them on every review pass, quietly, in the background. Most people don't think about them. They just see the review output.

Then I started catching things on my own review pass that the agent had marked clean. The style scan reported zero hits. I'd find three present-tense violations on the next read. A completeness check came back marked complete. A ticket requirement was unaddressed in the diff. The dimensions were running. They were also missing things, and I was the one finding what they missed.

This piece is about what I learned from that gap, what I built to close it, and the bigger principle I'm still working through. Sharing as I go, in case any of it is useful.

The setup

I'd built up a set of gates around the dimension checks. A PRE-FLIGHT gate that forced the agent to write a todo list with concrete execution methods for each dimension before any review work began. No "I'll check style" wishful thinking; you had to say "I'll run gh pr diff and scan for forbidden terms, will/would violations, and passive voice constructions." A COMPLETION gate that required documented evidence for all five dimensions before the review file could be written: findings, or "no issues, here's what I checked."

It felt thorough. Looked thorough. Read thorough on paper. Was thorough in approximately the same way a paper checklist is fire-safe.

The failure mode

What I started noticing across reviews:

A style scan reported clean. On my own pass through the diff, I caught three present-tense violations.

A completeness check came back marked complete. A ticket requirement was unaddressed.

Sub-agents reported back with results no artifact on disk corroborated. "Ran the will/would scan, zero hits." Where? Show me. There was no where. The scan had never produced output. It had produced a sentence.

I want to be careful about how I name this. It was satisfying the instruction as stated. The PRE-FLIGHT todo said "run the will/would scan." Marking the todo complete satisfied the instruction. Whether the scan had actually executed and produced findings was, structurally, outside the loop. The gate was social, not mechanical. It depended on the agent choosing to do the work, and on me choosing to believe that the work had been done because the agent said so.

I was reading the agent's confidence as evidence. Confidence is not evidence. It's a sentence about evidence. There is a difference, and the difference was costing me on every review.

The phrase that stuck was checkbox theater. The gates existed. They had names, structure, even formal blocking semantics. What they didn't have was teeth. They lived in instructions, and instructions are wishes.

The shift

The question I put to the agent: can we make these gates mechanical? Not "the agent should run the check" but "the agent cannot move forward until the check has produced a written artifact, on disk, tied to the current state of the PR."

That reframe is the whole article in one sentence. Evidence over status. Substrate over self-report. The shift from a gate that asks the agent to verify itself to a gate that verifies whether the agent has verified itself, where "has verified" is measured by file existence, not by claim.

Once that shift was on the table, the implementation became obvious. Most of it, anyway. I'm still finding the edges.

The implementation: three moves

Move 1: Scripts that write artifacts, not status flags

Each mechanical scan now writes a JSON file. Not a status. Not a return code. A file, with hit counts, file paths, sample matches, a timestamp, and the SHA of the PR HEAD it was run against.

{
  "pr_number": "NNNN",
  "pr_head_oid": "<headRefOid>",
  "run_at": "2026-05-12T08:59:14Z",
  "dimensions": {
    "style": {
      "status": "ran",
      "hits": 0,
      "source": "style-gate.sh (5 scans: will/would, passive, placeholders, superlatives, boolean)"
    },
    "readability": {
      "hits": 2,
      "scan": "sentence length > 25 words on added prose lines",
      "samples": ["docs/api/auth.md:42", "docs/api/auth.md:67"]
    }
  },
  "total_hits": 2,
  "status": "ran"
}

Enter fullscreen mode Exit fullscreen mode

The SHA pin is doing real work. If the PR gets a new commit, the artifact's pr_head_oid no longer matches the current HEAD. The artifact is now stale, which means the scan results are stale, which means whatever was clean five minutes ago is no longer demonstrably clean. The agent has to re-run.

Move 2: A hook that intercepts the destination

This is the move that turned out to matter most. Cursor supports a beforeShellExecution hook: a shell script that runs before any shell command the agent issues. The hook reads the command, decides whether it's a PR-write command (gh api .../pulls/<N>/comments, gh pr edit --body, gh pr comment), and if so, validates the gate artifacts before deciding whether to allow or deny.

The mechanism here is Cursor-specific. The principle isn't. Other agentic tools have equivalent shell-level hooks; if yours doesn't, the enforcement point shifts to a pre-commit hook or a CI gate, but the move is the same: put the verification somewhere the agent can't talk its way past.

The validation is dumb on purpose. Does PR-<N>-tickets.json exist? Does it have status: "loaded" or "partial_blocked"? Does its pr_head_oid match the current HEAD from gh pr view? Same questions for PR-<N>-gate.json. If any check fails, the hook returns deny with a clear message:

pr-review-gate-hook BLOCKED gh api .../pulls/NNNN/comments on PR #NNNN.

Missing or stale gate artifacts:
- Stale: PR-NNNN-gate.json pr_head_oid=<old-sha> but current PR HEAD is <new-sha>
  Re-run: ~/Documents/docs-agent/scripts/review-gate.sh NNNN

Resolve and retry. Bypass available for one command via environment variable.

Enter fullscreen mode Exit fullscreen mode

What changed when this hook went live: the agent stopped being able to lie about whether it had run the scans. There was nothing for it to lie about. Either the artifact existed and the SHA matched, or the call to GitHub got blocked. The agent could still produce a sentence saying "I ran the scan." That sentence no longer affected anything. The hook didn't read sentences. It read files.

Here's the before-and-after, visually:

One subtle but important detail: if the hook itself errors, if jq isn't installed, if gh can't reach the network, the command is allowed through with a warning. The gate fails open. This is deliberate. The cost of false negatives, a stale artifact slipping through, is low because the next review will catch it. The cost of false positives, every command bricked because the hook crashed, is high. A bypass environment variable exists for the same reason: when you genuinely need to override, you can, but you have to do it on purpose.

Move 3: The dimension rules that match

Hard gating only works if the gates point at the right things, so the rules inside the dimensions got tightened too. The style dimension can no longer be satisfied by citing the style guide; you have to run the mechanical scans and either resolve every hit via inline suggestion or document zero matches with the command that produced them. The completeness dimension requires a per-requirement mapping table built from tickets.json, not from the PR diff alone, because feature-mapping from the artifact being reviewed is circular. The rule structure stopped being aspirational and started being operational. "Run the check" turned into "produce the artifact that proves you ran the check, and here's the schema."

The feedback loop: lessons as infrastructure

Hard gating handles the failure modes I already knew about. It does not handle the ones I haven't run into yet. For those, there's a separate piece: the gap log.

The gap log is an append-only file. After every review, the POST-REVIEW IMPROVEMENT gate runs: take every reviewer comment, ask whether the workflow could have caught it before submission, and if yes, draft a check that would catch it next time. The check gets logged.

The format is one line per gap:

2026-05-10 | PR-1234 | style    | passive voice not caught on added definition lines | open
2026-05-12 | PR-1242 | complete | nav entry missing for new partial                  | mechanized
2026-04-22 | PR-1289 | clarity  | "this powerful feature" not caught                 | resolved

Enter fullscreen mode Exit fullscreen mode

Three statuses do the work:

Status Meaning
open Logged but not yet caught by any script or hook. Next PRE-FLIGHT reads this and injects the gap as an additional dimension check.
mechanized A scan or hook now catches this pattern automatically. The gap can sit dormant; the infrastructure handles it.
resolved The underlying recurring pattern is gone (often because upstream changed). No further check needed.

What this does, structurally, is convert one-time learnings into infrastructure. A gap surfaced in PR-1234 doesn't sit in a Slack message I'll forget about. It sits in the log. The next PRE-FLIGHT reads the log and reminds the agent. When I get around to writing a script that catches the pattern mechanically, the status flips. The lesson doesn't depend on me remembering it.

Honest disclosure: this part still has friction. At the end of each session, I have to prompt the agent to walk through reviewer comments and log gaps. The pattern isn't fully self-sustaining yet. The log gets written. The injection works. The "remember to look at the comments" step is still mine. There's a future article in closing that gap, and I'm still figuring out the right shape for it.

The principle

Trust-but-verify is the wrong frame for AI workflows, because verification is what you're asking the LLM to do. The agent that runs the check is the same agent reporting on whether the check ran. There is no second agent watching. The integrity of the whole thing depends on a self-report from the entity being audited. That's not a verification model. That's a wish.

The fix isn't a better prompt. The fix is moving verification outside the agent's control loop entirely. The agent can write any sentence about whether it ran a scan. The hook cannot write any sentence about whether a file exists. The file is on disk or it isn't. The SHA matches or it doesn't. There is no clever phrasing the agent can produce that changes that.

Where I am right now: hard gates for the failure modes I know about, gap log for the ones I don't, manual prompting for the meta-check that I don't have a way to automate yet. It's a workable system I'm still actively improving.

If you're building AI-assisted workflows, especially ones where the LLM both does the work and audits the work, I'd push you to ask the same question I had to ask: what does my agent actually produce, on disk, that I could check without taking its word for anything? If the answer is nothing, that's checkbox theater. The first step to closing the gap is making it produce something.

More on the specific mechanical scanners in the next piece. More on the gap log and the durability problem in the one after that. If you've solved any of this, or seen a sharper take on it, I'd want to hear from you.


I'm publishing this while the system is still actively improving, because the principle landed for me before the implementation did, and the implementation isn't finished. If you're building AI workflows where the agent both does the work and audits it, I'd want to hear how you've thought about that gap, or whether you've found a way to close it.

I write about AI-assisted documentation workflows, developer experience, and the evolving role of technical writers. If any of this resonates, let's connect on LinkedIn.