惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Apple Machine Learning Research
Apple Machine Learning Research
The GitHub Blog
The GitHub Blog
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
爱范儿
爱范儿
量子位
宝玉的分享
宝玉的分享
人人都是产品经理
人人都是产品经理
博客园_首页
博客园 - 【当耐特】
Last Week in AI
Last Week in AI
Martin Fowler
Martin Fowler
Microsoft Azure Blog
Microsoft Azure Blog
美团技术团队
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
aimingoo的专栏
aimingoo的专栏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
GbyAI
GbyAI
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
腾讯CDC

DEV Community

How To Build an Image Cropper in Browser (Simple Steps) I built a macOS disk cleaner for developers and just launched it would love feedback Membangun Kompetensi dan Relasi: Mengapa Ekosistem Kampus Itu Penting I Built an AI That Decides Which AI to Talk To — Running 24/7 From My Living Room Codex Team Usage SOP How to Actually Become a Programmer: The Hard Part Nobody Wants to Explain Building a Production-Style Multi-Tool AI Agent with Python, Flask, React & Gemini AI The Caretaker Sandbox: An Offline-First Visual Playground & Template Engine powered by Gemma 4 # Building Instagram OSINT Projects with HikerAPI Your AI can read. Gemma 4 can see The Battle of the Senior Dev: Why AI Gives You Wings But Only If You're Ready to Pilot HiDream Raw Output Failed Tried Dev-2604 VRAM Math Killed It Won with a Prompt Enhancer Instead I Finally Finished a Project I Abandoned — And GitHub Copilot Helped Me Ship It SafeSMS: On-Device Threat Detection with Gemma 4 E4B, no internet required I Built OpenKap — A Loom Alternative for Small Teams Who Just Want to Ship Gemma 4 is Here: The Dawn of Local Multimodal Reasoning Offline-First Flutter: How We Built a CRM That Manages 100K+ Leads With No Internet Memory for Agents: When Vectors Meet Graphs, Bugs Drop 4 The Rise of Production-Grade AI Infrastructure I ran my idea-validation product through its own validator. The verdict was PIVOT. We Built an Agent Commerce API. Google I/O 2026 Changed Our 3-Month Roadmap in 24 Hours. "My Partner's Memory Was Full. I Didn't Know — Until We Tried to Talk." I’m a Front End Web Developer Learning Machine Learning From Scratch Laravel Waiting Request I Built a Chrome Extension to Track How Long You Actually Spend on Each Tab Why Google Can't See Your React Breadcrumbs (And the 4-Line Fix) AI Travel Assistant Powered by Gemma 4; With Streaming, Image Input, and Visual Recommendation Cards Microsoft tried to kill the printer driver. Healthcare said no. The Blueprint Beneath the Blueprint: Designing Data Model and Choosing Its Database REST APIs vs Webhooks in Telecom Billing - Which One Actually Makes Sense? Accounting Made Simple: AI-Powered Financial Insights of Japanese Companies with Gemma 4 The append-only AST trick that makes Flutter AI chat actually smooth Designing the Future of Payments — Why XML Still Matters in the Age of APIs From Legacy to Live — Reviving XMLPayments with GitHub Copilot Two Weeks Into Learning Solana XMLPayments — The Hidden Backbone of Modern Financial Orchestration AI Agents in Practice — Read from the beginning Reviving My Gemma Agentic Framework: From Prototype to Polished Repo Smart Contracts Demand Better Infrastructure: Building on contract.dev Self-Hosted LLM Tool Calling: Forge and the Build-vs-Buy Decision ORA-00072 오류 원인과 해결 방법 완벽 가이드 OpenWA for CTOs: Self-Hosted WhatsApp Gateway Trade-Offs NotebookLM Automation With notebooklm-py: Useful, But Classify Data First Docker v29.5.x Operator Upgrade Checklist Coding-Agent Instruction Design: The CLAUDE.md File That Prevents Rework When I Finally Realized My Runtime Was Holding Me Back GnokeOps: Host Your Own AI House Party The Death of Static Rate Limiters: Why Your Java Virtual Threads Need BBR-Style Adaptive Concurrency AI Agents in Practice — Part 2: What Makes Something an Agent Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule that fixed mine Beyond Prompts: Structuring AI Workflows for Real Frontend Engineering From an Abandoned Hackathon Project to an AI Study Workspace 🚀 Terraform with AI: Build AWS Infra (Cursor + MCP) What If AI Didn’t Need the Internet? 750,000 Chips, 140 Trillion Tokens: The Math Behind DeepSeek's Permanent Price Cut You're Renting Someone Else's Compute — And It's Costing You More Than You Think CSS :has() Selector: The Layout Trick I Wish I Knew 5 Years Ago Five Clusters. Five Lessons. One Production System. Synaptic: A Local-First AI Dev Companion That Remembers How You Think Revolutionizing Edge MedTech: Building a Sovereign Sleep Apnea Companion ("XiHan Snore Coach") with Gemma 4 HDD Eksternal Tiba-Tiba Tidak Bisa Diakses di Windows? Ini Tiga Lapis Fix-nya DMARC p=none vs p=quarantine vs p=reject: what to use and when DSA Application in Real Life: How Git Diff Works: LCS Intuition, Myers Algorithm, and Real Code Changes I solo-built a reputation layer for AI agents on NEAR — and here's what I learned I built an AI faceless video generator in 2 months — here's the stack Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the Autoregressive Speed Ceiling llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts From the Renaissance to the Quantum Dawn: AI, Computation, and the Next Paradigm Shift How I Built a Review Site with 800+ Articles Using AI I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes Why your vulnerability dashboard is lying to you (and how to fix it) From Abandoned Prototype to Smart AI System: Reviving Trafiq AI with GitHub Copilot Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next The 7-Layer Memory Architecture Behind Modern AI Agents I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up NovelPilot: A Novel Writing Agent Powered by Gemma 4 BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090 Google Just Declared the Chat-Log Interface Dead. Here's What Neural Expressive Actually Signals for Developers. ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC Notes from a Hammock What's Google Antigravity 2.0 ? Here's What the Agent Harness Actually Changes for Developers. Building an E2EE Chat App in Flask - Part 3: Keeping File Uploads Safe Google's Gemini Spark. Here's What It Actually Does for Developers. Microsoft Just Shipped MCP Governance for .NET. Here's What It Actually Enforces. How I Built a Pakistan Internet Speed Test Platform at 16 How to Build a Supervisor Agent Architecture Without Frameworks I Built My Own Corner of the Internet — Here's What It Looks Like How does VuReact compile Vue 3's defineExpose() to React? Neo-VECTR's Rift Ascent Idempotency Keys: The API Safety Net You Probably Aren't Using Building E-Commerce Sites for Niche Products: Technical Lessons from Specialty Outdoor Retailers Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled
Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead
Sagiv ben gi · 2026-05-23 · via DEV Community

AI-generated code should be treated as third-party code. Same mental model we already use for libraries and dependencies. We don't review every line of lodash, fastapi, or chi. We shouldn't expect to review every line of AI-generated code either.

I argued this in my previous post. The natural follow-up question: okay, but what does that actually require? You can't tell people "trust it like you trust open-source" without explaining what that trust is built on. This post is a first attempt at answering that.

We Already Have A Trust Framework. We Just Don't Use It For This.

We trust open-source code we've never read. Every day, in every codebase. That trust didn't come from any single tool. It came from a stack of agreements built up over decades. Semantic Versioning. Conventional Commits. Lockfiles. Changelogs. Module boundaries. License declarations. Package signing.

None of these are tools. They're primitives. Foundational contracts about how to describe code, change, and intent in a way that humans, and the tools we build, can rely on.

If AI-generated code is just another kind of third-party code, the question is straightforward: which of those primitives carry over, which need new equivalents, and which are missing entirely? Tools will follow once we agree on the shape underneath. They always do. But the shape has to come first, because right now every team trying to build trust for AI-generated code is doing it in private, with their own conventions, and the result is a pile of point solutions that can't compose.

So this is the question I want to look at. Not what tools we should build. What does the trust stack look like once we apply the OSS lens to AI-generated code?

What Made Open-Source Trustworthy

We don't usually interrogate this. We just trust well-maintained libraries. But why?

Strip the open-source trust stack down to its primitives, the underlying contracts and not the tools built on top of them, and you get something like this:

Authorship. Every change has an author, a timestamp, and ideally a reason. Git history isn't just a log, it's an audit trail. You can follow a line of code back to the moment it was written and the person who wrote it.

Versioning as a contract. Semantic versioning isn't just a numbering scheme. It's a promise. A patch bump says "we fixed something, nothing breaks." A major bump says "we changed the contract, you need to adapt." It's a communication primitive between maintainers and consumers.

Conventional commits as intent. Commit messages like fix:, feat:, chore: encode the intent of a change. They're the foundation that changelogs are generated from. The changelog is how you understand what happened without reading every diff.

Behavioral contracts. Type signatures, documented APIs, interface definitions. These define what the code promises to do. They're the spec that tests are written against and the boundary that consumers depend on.

Automated verification. Linters, type checkers, security scanners, coverage gates. Not one-off checks but enforced, repeatable gates that run on every change. The trust isn't in any single test. It's in the habit of verification.

Isolation. Code lives behind package boundaries. You can pin it, swap it, replace it. The blast radius is contained because the boundary is real.

None of these are tools. They're agreements. The reason we can ship third-party code we haven't read is that this stack of agreements is doing the work for us, every time, quietly, in the background.

The lens I want to apply to AI-generated code is exactly this. For each of these, what's the equivalent for code that an agent generated inside our own repo? Where does it already exist? Where does it need to be built?

The Primitives AI-Generated Code Needs

1. Traceability

The open-source equivalent: Authorship. Author, commit timestamp, git blame.

What AI-generated code needs: Three things. A clear marker that this code was AI-generated. The human who approved it. And a link to the originating request. The originating request is whatever your team uses to describe a unit of work, a ticket, an issue, an RFC, a spec.

Without those three, nothing else in the stack has anything to attach to. You can't version, audit, isolate, or post-mortem something you can't identify. You can't ask "which AI-generated module is involved in this incident" because the question doesn't have a place to land.

This is also where ownership becomes real. You didn't write it, but you approved it. Your name is on it. The originating request anchors the why, the human approver anchors the who. The marker is what lets all of it be queried as a class. Three items, all of them answerable today if we choose to make them answerable.

2. The Decision Log

The open-source equivalent: Conventional commits. Changelogs. Release notes.

What AI-generated code needs: A record of why this code was generated. What problem it was solving. What constraints the agent was given. What the intent was.

This is the one I'd argue is most under-served, and the one with the highest leverage. When AI-generated code does something unexpected six months from now, the question won't be "what does this code do." You can read it. The question will be "why was it written this way, and what was it trying to accomplish?" Without a decision log, that answer is gone.

The decision log doesn't need to be elaborate. The original prompt or task description, the key constraints, the intent in plain language. Stored somewhere queryable. Attached to the change, the module, or the PR, not buried in a Slack thread that disappears in 90 days.

The open-source world solved this with conventional commits and changelogs. Imperfect, inconsistent, but present. The primitive exists. For AI-generated code, we haven't even agreed it needs to.

3. Behavioral Contracts

The open-source equivalent: Type signatures, documented APIs, interface definitions.

What AI-generated code needs: The same thing, but written before the code, not inferred from it.

The direction matters. In the open-source version, the interface is defined first and code implements it. With AI generation, it's easy to end up with code that works and an interface reverse-engineered to match it. That's backwards. The contract is supposed to be the promise. If the implementation defines the promise, the contract isn't doing any work.

The right approach is contract-first generation: you define the behavioral contract, and the AI produces code that satisfies it. The contract becomes the spec, the test anchor, and the documentation, all at once. It's also what shifts human review from line-by-line to contract-level. You're not reviewing the implementation. You're reviewing the contract, and trusting the verification layer to enforce it. That's exactly what we already do for third-party libraries. We don't read lodash. We don't read fastapi. We don't read chi. We trust the contract.

4. Verification

The open-source equivalent: Linters, type checkers, security scanners, tests written by consumers against the public API.

What AI-generated code needs: Verification with three layers.

Deterministic. Linters, type checkers, static analysis, security scanners, schema validators. The parts of the verification stack that don't have a bad day. They run, they pass or fail, no judgment call. For AI-generated code this matters more than for human-written code, not less. The human writer at least had context the linter didn't have. The AI writer didn't have that context either. The deterministic floor is the part of verification you can rely on without trusting the writer or the reviewer.

Contract-anchored. The contract is the source of truth, not the implementation. Tests that describe what the code does instead of what the contract demands aren't really tests, they're a self-portrait. This separation is something we get for free with a third-party library. The test author has the docs and the public API, the contract is the only thing the tests can lean on. With AI-generated code, we lose that the moment the implementation starts driving what the tests check.

Org-aware. Conventions. Internal tooling. Naming patterns. The relationships between services. The business invariants the contract never captured. The things human reviewers check today without thinking about it. The implementation gets checked against the org's accumulated standards, not against itself. Without this layer, "we won't review every line" turns into "we don't catch the things humans currently catch."

The point of all three layers, taken together, is that humans don't have to review every line. The verification stack catches what reviewers catch today, deterministically where possible, against the contract where the contract is the spec, against the org's standards everywhere else.

5. Isolation and Blast Radius

The open-source equivalent: Package boundaries, dependency injection, the principle that you should be able to swap a library without rewriting your codebase.

What AI-generated code needs: The same discipline, applied with intention.

A package boundary isn't just an organizational convenience. It's a blast-radius mechanism. When the package breaks, the rest of the system keeps working. When you want to replace it, you don't have to rewrite everything that touched it. AI-generated code without a boundary is the opposite. It infiltrates. It couples to things it shouldn't. By the time you realize part of it is wrong, it's wrapped around half your codebase.

This isn't really new. It's an existing discipline applied inward. The same care we already apply to third-party code, isolated behind clear interfaces, swappable, replaceable, deletable, should extend to AI-generated modules. With the boundary in place, the operational story falls into place. Feature flags. Staged rollouts. Falling back to a known-good path. The boundary is the primitive. The rest is practice.

Informed Ownership

You don't read every line of every library you import. But when one breaks, you fix it. You read the docs. You write tests against its behavior. You check the changelog. The level of understanding isn't "I've read the source." It's "I know what this does, what it promises, and what to do when it doesn't."

That's the level we should be aiming for with AI-generated code. You didn't write it. You might not have reviewed every line. But it's in your codebase. It's yours. The trust stack is what makes that ownership real.

Where This Goes Next

This is a starting point, not a finished framework. Five primitives that I think need to exist before we can treat AI-generated code the way we already treat third-party code.

The questions I think are worth working on next:

  • Who defines the convention for marking AI-generated code and linking it back to a request and an approver? This feels like something that needs to be standardized across teams, not invented separately by every tool vendor.
  • What does a decision log actually look like in practice? A structured comment block? A linked spec? A generated summary from the prompt session? An entry in a separate metadata file?
  • How do we write behavioral contracts that are useful both as input to AI generation and as anchors for verification? Are existing type systems enough? Do we need something new?
  • What does the org-aware layer of verification actually look like? Does it live in CI? In the agent's context? Both? Today's verification stack stops at deterministic checks and contract tests, the layer that catches business invariants and org conventions doesn't really exist yet.
  • What's the right level of isolation? Module-level? Service-level? Function-level? When is it overkill?

If you have thoughts on any of these, or think I'm wrong about the list, I'd love to hear it. Find me at @sag1v.

Open-source gave us a trust framework we use every day without thinking about it. AI-generated code deserves the same.


Originally published on debuggr.io.

I write about software engineering, AI, and the things that interest me along the way. If this resonated with you, come visit debuggr.io for more.