惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Hacker News: Ask HN
Hacker News: Ask HN
Last Week in AI
Last Week in AI
G
Google Developers Blog
腾讯CDC
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 司徒正美
IT之家
IT之家
博客园 - 聂微东
Google DeepMind News
Google DeepMind News
M
Microsoft Research Blog - Microsoft Research
Blog — PlanetScale
Blog — PlanetScale
D
Docker
F
Fortinet All Blogs
A
About on SuperTechFans
J
Java Code Geeks
Microsoft Azure Blog
Microsoft Azure Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
C
Cyber Attacks, Cyber Crime and Cyber Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
小众软件
小众软件
PCI Perspectives
PCI Perspectives
GbyAI
GbyAI
Recorded Future
Recorded Future
E
Exploit-DB.com RSS Feed
V
V2EX - 技术
S
Schneier on Security
S
Security Archives - TechRepublic
I
InfoQ
Hacker News - Newest:
Hacker News - Newest: "LLM"
L
LINUX DO - 最新话题
W
WeLiveSecurity
Security Latest
Security Latest
博客园 - 三生石上(FineUI控件)
T
The Blog of Author Tim Ferriss
Stack Overflow Blog
Stack Overflow Blog
Stack Overflow Blog
Stack Overflow Blog
Hugging Face - Blog
Hugging Face - Blog
B
Blog
Apple Machine Learning Research
Apple Machine Learning Research
Recent Commits to openclaw:main
Recent Commits to openclaw:main
S
Secure Thoughts
B
Blog RSS Feed
N
Netflix TechBlog - Medium
C
Comments on: Blog
SecWiki News
SecWiki News
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
雷峰网
雷峰网
P
Proofpoint News Feed
I
Intezer

DEV Community

Zero Heap Allocations at 1.18 GB/s: Deep Dive into ForgeZero 4.0.x Why Perplexity Started Citing My Blog: 5 Changes That Actually Worked Sync Supabase via OAuth: No Connection String Needed I asked three AI models the same API question. Only one had it right. Implementing Saga Pattern With Lambda Durable Function Why does AI forget what you said (and how to fix it) I built a daily Wordle-style game for AI tools - Here's how Mapping Polish company structures: querying KRS direct via API Built tmpdrop — a tiny self-hosted ephemeral file drop Running Local LLM - 0$ Personal Agentic AI Assistant - Part 3 LLD Object-Oriented Design: Interfaces & Abstract Classes (Designing Contracts) The Smaller Ship: Vitalik, the Ethereum Foundation's Restructuring, and What It Leaves for Investors Looking for 4 people to build something weird with me Building a Local-Only RAG System with Ollama and TypeScript The False Positive Tax: a 1:1 TP:FP analysis of eslint-plugin-security What's new in Data Preprocessor 1.5.x — R codegen, Robust Scaler, and a deadlock post-mortem How I self-hosted my Flask app on an old laptop for almost free I built a free DSA interview prep site because I was tired of the existing options I built an AI agent that migrates Next.js Pages Router to App Router Prisma Query Logging and PostgreSQL: Where the ORM Ends and the Database Begins Prisma query logging y PostgreSQL: dónde termina el ORM y empieza la base From Browser to Server : The Journey of an HTTP Request (Demystifying the Web’s Infrastructure) Santa Augmentcode Intent Ep.6 I Benchmarked 17 ESLint Security Plugins. Only One Found Every Vulnerability. How to Build a High-Performance Image Optimization Pipeline in 5 Minutes 50 Linux Commands Every DevOps Engineer Must Know Less Toil, More Flow - Automating the Path from Request to Implementation The Code Review Checklist I Actually Use How I run a small blog on Astro 5 + Content Collections Git: Best Practices for Professionals How IBM Bob Became My Everyday Coding Companion Solana Passkey Wallet: Replacing Seed Phrases with SIMD-0075 I built a small browser puzzle game about arrows I wrapped Claude Code in a zsh function. Here's every decision I almost got wrong. Mobile Game Optimization: A Unity Developer's Checklist Git: Best Practices for Beginners Three days I lost chasing a ghost that was already dead on disk Why Too Many Parts Hurt ClickHouse Performance Guardrails for Agent Output: Pluggable Validation Before and After LLM Calls Gemma Forge: Local AI Without the Setup Wall From Half‑dead Prototype to Local‑Only AI Medical Assistant: Rewiring MedClinic with GitHub Copilot Runninig a forkbomb in Jenkins What’s Actually Happening When You Use Git Preventing Recursive Tool Loops in LangChain Agents Building a Rock-Paper-Scissors CLI with TypeScript — Union Types, Conditionals, and Jest Your AI Coding Agent Wastes 80% of Its Context. Fixed That with Graph Theory. Why Flutter Has Become the Go-To Framework for Fintech App Development We built a scripting language just for AI agents. Here's why. Stop building AI inboxes. Build decision layers instead. Meme Monday Why I Built @editora/ui-react? Are AI tools the next level of abstraction in software development? Identity on Solana: Your Wallet Is Your Account One API Call Changed Everything The Internet Career Nobody Talks About Enough: What Is DevRel? Solar Panel Wiring Diagram: Series vs Parallel Hello everyone! Glad to join the dev.to community I Built an AI Agent That Tailors My Resume - Here's How Agents Actually Work I Built a WhatsApp OTP + AI Chatbot Platform for African Businesses MTP Explained — And Why It Matters for Android on Mac Most Beginners Learn Full-Stack Development Backwards GitHub Glow-Up: Open Source, READMEs, Badges, Streaks, Git and gh CLI System Design Cheat Sheet: Concepts Every Developer Should Know Are Junior Developer Roles Actually Dying? A Fresher's Honest Take Using DigitalOcean Droplets as Ephemeral Sandboxes for AI Agents I built a VSCode extension that visualises your code navigation as a call tree — made for legacy codebase pain Vite predev/prebuild: chaining scripts without losing your mind A website to save you from messy browser tabs Dear Web2 Developer... Solana is here calling Postgres JSONB indexes: GIN vs BTREE on the same column The $5 AI That Remembers Everything What are your goals for the week? #180 Zettelkasten for Developers: A Practical Method That Works OpenClaw vs Hermes Agent: Stars, Downloads & Usage 2026 `act` vs. `waitFor` Global Teams Don’t Struggle With Time Zones. They Struggle With Context Python as a JavaScript Dev $5.4 Billion in Damage. 8.5 Million Machines Down. Three YAML Controls Would Have Prevented It. Here's the Structural Analysis. 🚫 Stop Using PN532 V1 for Your NFC Projects (Real Debugging Experience) Probabilistic Graph Neural Inference for smart agriculture microgrid orchestration for extreme data sparsity scenarios Inference Is Becoming the New Steady-State Cost Center Why AI-Generated Code Is Always Good Enough — And Never Great I built a dark admin dashboard template in HTML — no React, no npm, just pure HTML What is the Difference Between Lattice-Based and Hash-Based Signatures? Next.js App Router caching: revalidate, dynamic, and no-store without the folklore Next.js App Router caching: revalidate, dynamic y no-store sin folklore I built Stashly — a full-stack content manager with a rich text editor published: false tags: react, node, mongodb, typescript Why I Started Building React Projects Instead of Just Watching Tutorials ? Every Tool Eventually Becomes Tuesday Nobody Warns You That Real Software Engineering Feels Chaotic Tích hợp VNPay, Stripe trong Odoo 19 BeautifulSoup and Requests for Web Scraping With Python: When Simple Still Works I Was Stuck Debugging React — Then Developer Tools Changed It Buck Converter Ripple: Sizing the Inductor and Capacitor With Confidence AWS Just Made Its MCP Server Generally Available. Here's What It Actually Gives AI Agents. RAMPART Tests Your AI Agents in Dev. What Catches Malicious Tool Calls in Production? Vibe Team Software Engineering: What a Real AI Human Dev Team Workflow Actually Looks Like An npm Package for AI Agent Orchestration Just Shipped With Its Front Door Unlocked. Here's What the CVE Actually Reveals. Microsoft Foundry Just Added CI/CD for AI Agents. Here's What That Actually Changes. The Best Career Insurance Is a Tech Event You Don't Want to Attend
The Minimum Viable Test Suite for Working with Agents
Ian Johnson · 2026-05-25 · via DEV Community

The advice "you need more tests" is correct in roughly the same way "you should eat better" is correct: technically true, infinitely deferrable, and not actionable until someone makes it specific.

Teams adopting agents tend to hear the same advice in slightly different form: "your test coverage needs to be high before you let agents work on real code." This is also correct in roughly the same way. It is not wrong. It is just not the question. The question is: which tests, where, and in what order, to get the most agent reliability out of the least investment.

Most teams have nowhere near the coverage they think they need. Most teams also do not need the coverage they think they do. The middle ground, strategic coverage at the seams that matter, is small enough to actually build and big enough to actually help.

Why "more tests" is the wrong frame

Test coverage is a stock, not a flow. The total percentage tells you how much code is exercised; it tells you almost nothing about whether the tests would catch the failures you care about. A codebase with 90% coverage made of weak assertions ("the function returned without throwing") is less safe than a codebase with 40% coverage made of strong ones ("the function returned the correct value, given these inputs, with these side effects").

Agents amplify this distinction. An agent can ship code that passes a weak test as readily as code that passes a strong one. The weak test was producing false confidence in human-authored work too; agents just produce that work faster, so the false confidence compounds faster.

The frame that works is not "more tests" but "tests that would fail when the agent gets it wrong." Coverage is a side effect of having those. Chasing coverage directly produces code that exercises lines without verifying behavior.

The seams that matter

Some parts of your codebase are higher-leverage to test than others. The high-leverage seams have three properties: they are boundaries where one part of the system meets another, they encode business rules the rest of the system depends on, and they fail in ways that are hard to detect by inspection.

API contracts are the canonical example. The boundary between your service and its callers. If the contract changes silently (the response shape shifts, a field becomes optional, an error code changes) every caller can break in subtle ways. Agents are particularly likely to make this kind of change while "cleaning up" a handler. A contract test catches it on the PR. Without one, it catches it in someone else's outage.

Data layer boundaries are the second class. The seam where your code meets a database, a message queue, an external service. Most agent-introduced bugs at this seam are not the obvious "it does not work" kind; they are the "it works but does the wrong thing under load" kind. A few good integration tests at this boundary catch a disproportionate share of those.

Critical business rules are the third. The pricing logic, the permission checks, the rules that determine who can do what to which resource. These are the rules where being subtly wrong is worse than being obviously broken. They earn dedicated tests because nothing else in the codebase will catch their failures, and because the cost of getting them wrong is high.

If you have nothing else, having these three categories tested well is enough to make working with agents meaningfully safer.

The boring 80%

Most of your code is not at a seam. It is internal: helpers, formatters, view logic, glue. This code benefits from tests, but it does not benefit enough to be the first investment.

The reason is twofold. First, internal code fails in ways the next layer of code will catch. A formatting helper that produces the wrong output makes the rendered page look wrong, which the visual test or the e2e test or the user catches. Second, internal code changes often and is refactored frequently, which means tests at this layer have a higher maintenance burden per bug caught. The ROI is real but lower.

The implication: do not start by trying to test everything. Start by testing the seams. Get to a place where the agent cannot ship a change at a boundary without proving the contract still holds. The internal code can be tested as it matures, when patterns stabilize, when the cost of writing the test is amortized over a long-lived function rather than a function that will be deleted next week.

This is a deliberate inversion of the "test as you go" advice that works for human-authored code. With agents, the throughput is high enough that "as you go" produces a lot of tests of code that will not survive. Front-load tests where they matter; back-load tests where they do not.

The first three tests to write

If you are starting from a codebase with effectively no tests and you want to add the minimum viable set this week, the three to write first:

A contract test for your most-called API endpoint. Send a request that exercises the happy path. Assert on the full response shape, including types of fields. If your agent ever produces a handler change that breaks the shape, this test fails before the change merges.

An integration test for your most-critical write path. The signup flow, the checkout, the place where the wrong outcome would be visible to a user or an auditor. Drive it end-to-end against a real-enough environment. Assert on the resulting state, not just the response.

A unit test for the trickiest business rule you can think of, with at least three cases: the happy case, the edge case that broke once in the past, and the case the team always argues about during code review. Naming the test after the rule itself is more useful than naming it after the function under test.

Three tests. An afternoon of work. These will not give you coverage; they will give you the disproportionate share of the protection coverage is supposed to represent. The rest of the suite is built outward from there.

The weekly ritual

Once you have the seams covered, the practice is small: every time the agent ships something wrong that should have been caught, write the test that would have caught it. Add it before the fix lands. The fix proves the test was right; the test prevents the regression.

This is the same loop that runs in the post-mortem ritual, just shorter. The agent's mistakes are smaller and more frequent than incidents, but the lesson is the same: a one-time bug becomes permanent knowledge only when it is encoded as a test.

A team that runs this loop for six months has a test suite shaped to the actual failure modes the agent produces in their codebase. That suite is more valuable than any amount of bulk coverage built ahead of time. It is grown rather than written.

The minimum viable suite is the starting point. The discipline is what grows it.