惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

W
WeLiveSecurity
D
DataBreaches.Net
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
The Exploit Database - CXSecurity.com
D
Darknet – Hacking Tools, Hacker News & Cyber Security
腾讯CDC
PCI Perspectives
PCI Perspectives
阮一峰的网络日志
阮一峰的网络日志
S
Security Archives - TechRepublic
Hugging Face - Blog
Hugging Face - Blog
U
Unit 42
IT之家
IT之家
T
Troy Hunt's Blog
P
Proofpoint News Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
F
Full Disclosure
V
V2EX
Stack Overflow Blog
Stack Overflow Blog
C
Comments on: Blog
V
Vulnerabilities – Threatpost
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
V
V2EX - 技术
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
N
News | PayPal Newsroom
MyScale Blog
MyScale Blog
Google DeepMind News
Google DeepMind News
Application and Cybersecurity Blog
Application and Cybersecurity Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
李成银的技术随笔
P
Privacy & Cybersecurity Law Blog
大猫的无限游戏
大猫的无限游戏
V
Visual Studio Blog
T
ThreatConnect
WordPress大学
WordPress大学
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA
Recent Announcements
Recent Announcements
Google DeepMind News
Google DeepMind News
SecWiki News
SecWiki News
Recorded Future
Recorded Future
小众软件
小众软件
K
Kaspersky official blog
T
Tor Project blog
Last Week in AI
Last Week in AI
GbyAI
GbyAI
人人都是产品经理
人人都是产品经理
Jina AI
Jina AI
S
SegmentFault 最新的问题
MongoDB | Blog
MongoDB | Blog
Simon Willison's Weblog
Simon Willison's Weblog

DEV Community

Choosing the Right Treasure Map to Avoid Data Decay in Veltrix Migrating to Apache Iceberg: Strategies for Every Source System Implementation of AI in mobile applications: Comparative analysis of On-Device and On-Server approaches on Native Android and Flutter Should you use Gemma 4 for your Development? A Multiversal Analysis to Determine if Gemma 4 is Right for You! The Rising Trend of Creative Interview Questions in Tech I Spent Hours Fighting a Silent Subnet Conflict to Build an Isolated ICS Security Lab (And What It Taught Me About the Linux Kernel) It Worked When I Closed the Laptop. I Swear. We Built an Agent That Flags Fake Internships #kryx Your Personal AI Stack Is the New Dotfiles Your LLM Bill Is Exploding Because of Architecture, Not Pricing -- Here's the Fix How We Prevent Attendance Fraud Using GPS Verification AI Code Review in 2026: How the Tools Actually Differ (A Builder's Field Guide) From Problems to Patterns: Generative AI in .Net (C#) GemmaOps Edge: From 373 Alarms to 1 Root Cause Using Local AI (Gemma 4) Building an Amazon EKS Security Baseline Hands-On with Apache Iceberg Using Dremio Cloud 🤫 Firebase Is Quietly Preparing for an Offline-First AI Future Should Angular Apps Still Rely on RxJS in 2025? Gaslighting Gemma 4: Can Open-Weight Reasoning Models Withstand a Confident Liar? AI Workflow Automation Needs More Than Another Script Reviving Cineverse: From Local Storage to Firebase 🚀 Approaches to Streaming Data into Apache Iceberg Tables How to Add Rounded Corners to an Image Online The subtle impact of AI (&amp; IT) on jobs Made a Rust based AI agent Your AI is not bad, your instructions are What Clicked for Me After Building on Solana for a Few Days WhatsApp's Encryption Stack: What It Covers, What It Doesn't, and What a Federal Agent Spent 10 Months Investigating Building CogniPlan: A Local-First Task Planning System Using Apache Iceberg with Python and MPP Query Engines How I Built AegisDesk: A Zero-Token Semantic IT Agent with <5ms Latency I built CodeArchy: an open-source that turns any codebase into a visual, explainable architectural experience, powered by Gemma 4. The Day Our Bot Ran Out of Money How we're using Gemini Embeddings to build a smarter, community-driven feed on DEV The Speculative Decoding Pattern The PKCE "Gotcha" in Expo’s exchangeCodeAsync TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4) n8n for Healthcare: 5 Automations for Clinics, Practices, and Health Tech Teams (Free Workflow JSON) How I Built an OWASP Memory Guard for AI Agents (ASI06) Condition-Based vs Time-Based Maintenance: Making the Switch I Tested Spam Protection on Formspree vs Formgrid. The Results Were Surprising. May 27 - Video Understanding Workshop Beyond Keywords: How Google's 2026 Algorithms are Redefining SEO From Click to Cart: Ensuring an Accessible Customer Journey in WooCommerce Your company won't replace you with good AI. They'll replace you with bad AI. How to Use an SVG Icon Search Engine as a Claude Custom Connector O fim do “modelo que faz tudo”? Conheça o Conductor, a IA que orquestra outras IAs 10 First-Principles Strategies to Learn Any Programming Language Deeply 10 First-Principles Strategies to Learn Any Programming Language Deeply Understanding Embeddings easily. The Hidden Cost of “Move Fast and Break Things” Why Your Logs Are Useless Without Traces DressCode: Your AI Stylist for Tomorrow The Documented Shortcoming of Our Production Treasure Hunt Engine I'm 16, and I Built an AI Tool That Audits Your Technical Debt Without Ever Touching code Building Your Own Crypto Poker Bot: A Developer's Guide to Blockchain Gaming Logic Apache Iceberg Metadata Tables: Querying the Internals Hermes, The Self-Improving Agent You Can Actually Run Yourself Unity vs Unreal: 5 Things I Had to Relearn the Hard Way Building Agentic Commerce Infrastructure: Overcoming SQLite Concurrency for Autonomous Procurement Agents Solana Accounts vs Databases HTML Table Borders I built a skill that makes AI-generated AWS diagrams actually usable My first post! I'm kinda excited The Page Root Was the Wrong Unit How to audit what your IDE extension actually sends to the cloud I Migrated 23 Make.com Scenarios to n8n and Cut My Bill by 60% — Complete Migration Guide (2026) Solving a Logistics Problem Using Genetic Algorithms Claude Code Skills Explained: What They Are & When to Use Them (2026) Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup Zero-Idle Local LLMs: Running Llama 3 in AWS Lambda Containers We scanned 8 B2B SaaS companies across 5 categories. ChatGPT named the same 12 brands in every answer. How To "Market" Yourself As A Tech Pro We scanned 500 MCP servers on Smithery. Here is what we found. HTML Basics for Beginners – Markup Language, Elements and Types of CSS DiffWhisperer: How I Turned Cryptic Git Diffs into Architectural Stories with Gemma 4 I built a version manager for llama.cpp using nothing but vibe coding. Unit Testing vs System Testing: Key Differences, Use Cases, and Best Practices for 2026 A game design textbook explains why products with fewer features win How to Build a Raydium Launchpad Bonding Curve in 5 Minutes with forgekit How to turn an AI prototype into a production system How Data Lake Table Storage Degrades Over Time Partition and Sort Keys on DynamoDB: Modeling data for batch-and-stream convergence Auto-Generate Optimized GitHub Actions Workflows For Any Stack With This New CLI Tool Unchaining the African Creator Economy The Treasure Hunt Engine Gotcha - A Lesson in Constrained Performance great_cto v2.17 - no more tambourine dance When Catalogs Are Embedded in Storage SafeMind AI: Instant Health & Safety Intelligence What Is PKCE, How It Works & Flow Examples AI Agent Failure Modes Beyond Hallucination Fastest Way to Understand Stryker Solana Accounts Explained to a Web2 Developer TV Yayın Akışı Sitesi Geliştirirken Öğrendiğim Teknik Dersler $500 Challenge Drop My First Look at Google's Gemma 4: A Quick Introduction How I use an LLM as a translation judge Best Calendar and Scheduling API for Developers — 2026 Comparison Agentic AI in Travel: Why UCP Isn't Travel-Ready Yet — and What We Measured I Finished Machine Learning. And Then Changed The Plan.
Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead
Sagiv ben gi · 2026-05-23 · via DEV Community

AI-generated code should be treated as third-party code. Same mental model we already use for libraries and dependencies. We don't review every line of lodash, fastapi, or chi. We shouldn't expect to review every line of AI-generated code either.

I argued this in my previous post. The natural follow-up question: okay, but what does that actually require? You can't tell people "trust it like you trust open-source" without explaining what that trust is built on. This post is a first attempt at answering that.

We Already Have A Trust Framework. We Just Don't Use It For This.

We trust open-source code we've never read. Every day, in every codebase. That trust didn't come from any single tool. It came from a stack of agreements built up over decades. Semantic Versioning. Conventional Commits. Lockfiles. Changelogs. Module boundaries. License declarations. Package signing.

None of these are tools. They're primitives. Foundational contracts about how to describe code, change, and intent in a way that humans, and the tools we build, can rely on.

If AI-generated code is just another kind of third-party code, the question is straightforward: which of those primitives carry over, which need new equivalents, and which are missing entirely? Tools will follow once we agree on the shape underneath. They always do. But the shape has to come first, because right now every team trying to build trust for AI-generated code is doing it in private, with their own conventions, and the result is a pile of point solutions that can't compose.

So this is the question I want to look at. Not what tools we should build. What does the trust stack look like once we apply the OSS lens to AI-generated code?

What Made Open-Source Trustworthy

We don't usually interrogate this. We just trust well-maintained libraries. But why?

Strip the open-source trust stack down to its primitives, the underlying contracts and not the tools built on top of them, and you get something like this:

Authorship. Every change has an author, a timestamp, and ideally a reason. Git history isn't just a log, it's an audit trail. You can follow a line of code back to the moment it was written and the person who wrote it.

Versioning as a contract. Semantic versioning isn't just a numbering scheme. It's a promise. A patch bump says "we fixed something, nothing breaks." A major bump says "we changed the contract, you need to adapt." It's a communication primitive between maintainers and consumers.

Conventional commits as intent. Commit messages like fix:, feat:, chore: encode the intent of a change. They're the foundation that changelogs are generated from. The changelog is how you understand what happened without reading every diff.

Behavioral contracts. Type signatures, documented APIs, interface definitions. These define what the code promises to do. They're the spec that tests are written against and the boundary that consumers depend on.

Automated verification. Linters, type checkers, security scanners, coverage gates. Not one-off checks but enforced, repeatable gates that run on every change. The trust isn't in any single test. It's in the habit of verification.

Isolation. Code lives behind package boundaries. You can pin it, swap it, replace it. The blast radius is contained because the boundary is real.

None of these are tools. They're agreements. The reason we can ship third-party code we haven't read is that this stack of agreements is doing the work for us, every time, quietly, in the background.

The lens I want to apply to AI-generated code is exactly this. For each of these, what's the equivalent for code that an agent generated inside our own repo? Where does it already exist? Where does it need to be built?

The Primitives AI-Generated Code Needs

1. Traceability

The open-source equivalent: Authorship. Author, commit timestamp, git blame.

What AI-generated code needs: Three things. A clear marker that this code was AI-generated. The human who approved it. And a link to the originating request. The originating request is whatever your team uses to describe a unit of work, a ticket, an issue, an RFC, a spec.

Without those three, nothing else in the stack has anything to attach to. You can't version, audit, isolate, or post-mortem something you can't identify. You can't ask "which AI-generated module is involved in this incident" because the question doesn't have a place to land.

This is also where ownership becomes real. You didn't write it, but you approved it. Your name is on it. The originating request anchors the why, the human approver anchors the who. The marker is what lets all of it be queried as a class. Three items, all of them answerable today if we choose to make them answerable.

2. The Decision Log

The open-source equivalent: Conventional commits. Changelogs. Release notes.

What AI-generated code needs: A record of why this code was generated. What problem it was solving. What constraints the agent was given. What the intent was.

This is the one I'd argue is most under-served, and the one with the highest leverage. When AI-generated code does something unexpected six months from now, the question won't be "what does this code do." You can read it. The question will be "why was it written this way, and what was it trying to accomplish?" Without a decision log, that answer is gone.

The decision log doesn't need to be elaborate. The original prompt or task description, the key constraints, the intent in plain language. Stored somewhere queryable. Attached to the change, the module, or the PR, not buried in a Slack thread that disappears in 90 days.

The open-source world solved this with conventional commits and changelogs. Imperfect, inconsistent, but present. The primitive exists. For AI-generated code, we haven't even agreed it needs to.

3. Behavioral Contracts

The open-source equivalent: Type signatures, documented APIs, interface definitions.

What AI-generated code needs: The same thing, but written before the code, not inferred from it.

The direction matters. In the open-source version, the interface is defined first and code implements it. With AI generation, it's easy to end up with code that works and an interface reverse-engineered to match it. That's backwards. The contract is supposed to be the promise. If the implementation defines the promise, the contract isn't doing any work.

The right approach is contract-first generation: you define the behavioral contract, and the AI produces code that satisfies it. The contract becomes the spec, the test anchor, and the documentation, all at once. It's also what shifts human review from line-by-line to contract-level. You're not reviewing the implementation. You're reviewing the contract, and trusting the verification layer to enforce it. That's exactly what we already do for third-party libraries. We don't read lodash. We don't read fastapi. We don't read chi. We trust the contract.

4. Verification

The open-source equivalent: Linters, type checkers, security scanners, tests written by consumers against the public API.

What AI-generated code needs: Verification with three layers.

Deterministic. Linters, type checkers, static analysis, security scanners, schema validators. The parts of the verification stack that don't have a bad day. They run, they pass or fail, no judgment call. For AI-generated code this matters more than for human-written code, not less. The human writer at least had context the linter didn't have. The AI writer didn't have that context either. The deterministic floor is the part of verification you can rely on without trusting the writer or the reviewer.

Contract-anchored. The contract is the source of truth, not the implementation. Tests that describe what the code does instead of what the contract demands aren't really tests, they're a self-portrait. This separation is something we get for free with a third-party library. The test author has the docs and the public API, the contract is the only thing the tests can lean on. With AI-generated code, we lose that the moment the implementation starts driving what the tests check.

Org-aware. Conventions. Internal tooling. Naming patterns. The relationships between services. The business invariants the contract never captured. The things human reviewers check today without thinking about it. The implementation gets checked against the org's accumulated standards, not against itself. Without this layer, "we won't review every line" turns into "we don't catch the things humans currently catch."

The point of all three layers, taken together, is that humans don't have to review every line. The verification stack catches what reviewers catch today, deterministically where possible, against the contract where the contract is the spec, against the org's standards everywhere else.

5. Isolation and Blast Radius

The open-source equivalent: Package boundaries, dependency injection, the principle that you should be able to swap a library without rewriting your codebase.

What AI-generated code needs: The same discipline, applied with intention.

A package boundary isn't just an organizational convenience. It's a blast-radius mechanism. When the package breaks, the rest of the system keeps working. When you want to replace it, you don't have to rewrite everything that touched it. AI-generated code without a boundary is the opposite. It infiltrates. It couples to things it shouldn't. By the time you realize part of it is wrong, it's wrapped around half your codebase.

This isn't really new. It's an existing discipline applied inward. The same care we already apply to third-party code, isolated behind clear interfaces, swappable, replaceable, deletable, should extend to AI-generated modules. With the boundary in place, the operational story falls into place. Feature flags. Staged rollouts. Falling back to a known-good path. The boundary is the primitive. The rest is practice.

Informed Ownership

You don't read every line of every library you import. But when one breaks, you fix it. You read the docs. You write tests against its behavior. You check the changelog. The level of understanding isn't "I've read the source." It's "I know what this does, what it promises, and what to do when it doesn't."

That's the level we should be aiming for with AI-generated code. You didn't write it. You might not have reviewed every line. But it's in your codebase. It's yours. The trust stack is what makes that ownership real.

Where This Goes Next

This is a starting point, not a finished framework. Five primitives that I think need to exist before we can treat AI-generated code the way we already treat third-party code.

The questions I think are worth working on next:

  • Who defines the convention for marking AI-generated code and linking it back to a request and an approver? This feels like something that needs to be standardized across teams, not invented separately by every tool vendor.
  • What does a decision log actually look like in practice? A structured comment block? A linked spec? A generated summary from the prompt session? An entry in a separate metadata file?
  • How do we write behavioral contracts that are useful both as input to AI generation and as anchors for verification? Are existing type systems enough? Do we need something new?
  • What does the org-aware layer of verification actually look like? Does it live in CI? In the agent's context? Both? Today's verification stack stops at deterministic checks and contract tests, the layer that catches business invariants and org conventions doesn't really exist yet.
  • What's the right level of isolation? Module-level? Service-level? Function-level? When is it overkill?

If you have thoughts on any of these, or think I'm wrong about the list, I'd love to hear it. Find me at @sag1v.

Open-source gave us a trust framework we use every day without thinking about it. AI-generated code deserves the same.


Originally published on debuggr.io.

I write about software engineering, AI, and the things that interest me along the way. If this resonated with you, come visit debuggr.io for more.