Your test suite is the only thing that makes AI agents useful

A company called Reco converted JSONata from JavaScript to Go in 7 hours using AI. Seven. Hours. Not seven sprints. Not seven engineers over seven weeks. Seven hours.

And the internet immediately argued about the wrong thing.

Everyone wanted to talk about which model they used, which agent framework, which prompt magic made it happen. But the skeptics nailed the real story: this only worked because JSONata already had a rock-solid test suite.

The AI didn't understand the code. It executed against a spec.

The test suite is the spec

Think about what an AI agent actually does during a porting task. It generates code, runs it, checks the output, and iterates. That loop is only as good as the signal it gets back.

No tests? The agent is flying blind. It produces reasonable-looking Go code and has no way to determine if it is accurate. You, the human, are the test suite. And you're slower than a machine.

With tests? Every function gives a binary response: pass or fail. The agent can use the brute-force method to achieve correctness. It does not need to "comprehend" the expression language of JSONata. It just needs to turn the red dots to green. 🟢

The model isn't the magic

We continue to have the wrong discussion about AI-assisted engineering. "Which model is best for coding?" is the question that everyone poses. However, the model was never the bottleneck.

→ The bottleneck is whether your codebase gives the agent something to verify against.
→ A mediocre model with great tests will outperform a frontier model with no tests.
→ Tests turn AI from a suggestion engine into an execution engine.

Reco said the port would save them roughly $500k per year in infra costs. That's a fat number. But the real investment that made it achievable wasn't the AI bill. It was every engineer who invested time writing a test for JSONata over the last few years. Those tests were the capital.

Most codebases aren't ready

Here's the uncomfortable part. Most teams I've talked to — and I include past versions of my own team — don't have test suites that could support this workflow.

We have tests that are flaky. Tests that depend on environment state. Tests that cover the happy path and nothing else. Tests that exist to make a coverage metric look good in a dashboard nobody checks.

That's not a spec. That's decoration. 🎭

If you handed an AI agent your repo right now and said "port this to Rust," what would happen? Be honest. If the answer is "it would generate something that compiles but is subtly wrong in forty places," your test suite is the problem. Not the model.

The discipline was always the point

There's a beautiful irony here. For years, writing tests was the boring part. The thing senior engineers preached about and junior engineers skipped. The chore that slowed you down when you were trying to ship.

Now tests are the unlock. They're the thing that makes AI agents actually useful instead of just impressive in demos.

→ Tests are a machine-readable contract for correctness.
→ AI agents are machines that can execute against that contract at inhuman speed.
→ Without the contract, the speed is meaningless.

The engineers who were "wasting time" writing thorough test suites for the last decade just accidentally built the infrastructure for the AI-assisted future. The ones who shipped fast and skipped tests are now stuck babysitting every line an agent produces.

I find that genuinely funny. 😄

The takeaway

Stop optimizing your AI workflow. Start optimizing your test suite. The model will get better on its own every few months. Your tests won't write themselves — and they're the only thing standing between "AI ported our codebase in 7 hours" and "AI generated 10,000 lines of plausible garbage."

The boring work was always the important work. AI just made that impossible to ignore.

So here's my question: if you pointed an AI agent at your codebase today, would your test suite be good enough to keep it honest? What's actually missing?

推荐订阅源