惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

DEV Community

The "AI" Label Is Losing Its Meaning, and Companies Are the Ones Diluting It Bucky Fuller's To-Do List: Can AI Finally Solve the World's Cataloged Problems? My $10/Month VPS Gets 659 SSH Attacks per Day — Here's What 4 Weeks of Running an Autonomous AI Has Taught Me About Infrastructure Speed Up Your WordPress Site in 30 Minutes: A No-Plugin Performance Guide Breaking Code: The Addiction Nobody in Tech Will Admit To Nobody Reads AI Safety Papers. But 649 People Upvoted a Letter to an LLM. The Pope wrote about me Je vibe-coded app werkt. Maar kan hij ook live? The Event Store That Survived Black Friday Without a Single 5xx Day 8 - Sparse embedding - RAG How we made our Mac launcher feel instant by killing slow providers How we made our Mac launcher feel instant by killing slow providers Enterprise AI Agent Orchestration Patterns How to build your first MCP server in 10 minutes Claude Code's plan mode is prompt engineering, not hard enforcement Built a C# AI Agent That Researches Errors and Suggests Fixes From Shell Scripts to MCP Servers: How SEO Broke My Brain (in a Good Way) AI Agent Platform Buyer's Guide: 12 Questions to Ask Before You Sign 🦋 I Built a Living Terminal Animation with Hermes Agent — Here's How It Went. AI Agents Are Coming for Your WordPress Admin Panel, and That's Not a Bad Thing Tailscale + k3s in a 2‑node homelab: why I use Tailscale ONLY for the control plane When NOT to Use AI Agents: A Realistic Framework Human-in-the-Loop Patterns for High-Stakes AI Agent Decisions LLM Cost Optimization for Agent Workflows: A Practical Guide An Evolving Strategy for Knowledge Work: From Human-In-the-Loop to Human-Before-the-Loop Why I Wake Up at 5am to Run (And Why You Might Want To) I Scanned 260 Packages that your are using and Found 43 With Security Vulnerabilities The Easiest Way to Implement Theme Toggling in React 19 using next-themes & Tailwind CSS v4 AI skill testing: yes, your prompts need regression tests Why We Built AnToAnt: Designing Software Before Writing Code How I Built an End-to-End HR Attrition Dashboard Using MySQL & Power BI Why Hytale Treasure Hunt Engines Stumble Before 1,000 Concurrent Diggers: What Veltrix Does Not Document How to Implement Dark/Light Mode with No Flickers in Next.js Building My First Solana Transfer CLI Tool | #100DaysOfSolana What Is OAuth Token Exchange? CLI wrapper for Cloudflare Tunnel with Zero Trust Your Agent Acts Without Checking Your Error Budget — That's the Failure Mode Nobody Is Tracking The Death of the Junior Developer Is Greatly Exaggerated How I Built a Programmatic SEO Site with 16,750 Pages Using FastAPI and PostgreSQL Toward a Standard Model for Agent Memory I Applied SLA Concepts to My Email Inbox — Here's What I Learned Building the Chrome Extension How Spring Data JPA, JPA, and Hibernate work together What useOptimistic Actually Saves You The Vibe Tax: How Unvalidated AI Code Is Flooding the Market and Driving Up Technical Debt Building My First MCP Server with Claude and Python Azure Blob Storage for Beginners: Private Access, SAS Tokens & Cost Savings Explained I'm building a TypeScript data grid where config reads like English Revamped Proof for Finish-Up-A-Thon Selectors and its uses in HTML & CSS Bronto for Fastly: Real-Time CDN Logging That Actually Scales I Built a Local Interview Coach That Learns From Every Submission With Hermes Agent. Genesis-GAL: Multiplatform Core Architecture (C++, Kotlin, Python) for CPU Thermal Optimization & Jitter Mitigation Why Delta, Iceberg, and Hudi Can't Write to FSx S3 Access Points — And What Works Instead Why I’m Exploring a PHP-Based KiwiPress Redshift Spectrum + Lake Formation — Enterprise Governance on NAS Data Read-Write ETL on NAS Data with EMR Serverless Spark — No Cluster, No Copy The New Digital Divide: Will "Vibe Coding" Really Make Everyone a Developer? I Was Tired of Broken Deployments, So I Built This CLI Tool Vibe Coding vs. System Architecture: Why "It Works" is Not the Same as "It Scales" How iOS developers actually get paid: a practical guide to Apple's fiscal calendar How to Grayscale Images of Out-of-Stock Products in WooCommerce Using CSS I'm a Master's Student in AI & Big Data. And AI Just Gave Me My Freedom Back. npm Scripts and package.json: The Complete Guide (2026) How to Boost Customer Loyalty with Automatic Discount Codes in WooCommerce How to Hide Out-of-Stock Products on Your WordPress Website The Easiest Way to Add Dark Mode to Your Website How to Build an Enterprise Browser — Branding The Champion: Showing Up for the Ecosystem How I Escaped Claude & Cursor Limits: The Ultimate Free Local AI Coding Setup with Ollama + Continue.dev (2026 Guide) Serving a Fleet of SLMs on One RTX 5080: Multi-Model on a Single Consumer GPU Building an Error Monitoring Tool Without Pricing Overages Checking Internet Status in Basic4Android Binary Tree Recursion in Interviews: The Call Stack Diagnostic Just another curious tinker, looking for a community... Token-level eval harness for tool-calling agents: what we wired up Why Some Codebases Are Hard to Understand: Cognitive Surface Area and the Hidden Cost of System Navigation Trust Boundaries in Client-Side Health Apps The fastest way to update Node.js on your Mac Prompt is Not Runtime: Why I Rejected LLM State-Machines for Deterministic FinTech SDD en proyectos brownfield: pros, contras y la estrategia que realmente funciona Hexagonal Architecture in Practice: Ports, Adapters, and Tests That Skip the Database Your Playwright Tests Will Need Refactoring. Here's How to Make It Painless Development of a custom API layer for Framer CMS integration Stream 24/7 on YouTube with Ant Media Server Chat With Your Raspberry Pi — Control GPIO, Read Sensors, and Manage Services via Telegram Using Garudust Run OpenAI Codex CLI on Claude, Gemini, or Llama — in 50 lines of C# Token economics for AI agents: why workflow ownership matters more than task automation Why SMS Codes Are No Longer Enough for Business Security Communicate Ideas Visually: Let AI Run the Feedback Loop Building an Autonomous AI Hiring Agent with Multi-Agent Runtime Orchestration 🚀 Validating lists in Okyline: uniqueness, order, and cross-element rules Base64 encoding visualizer I Built a Browser Game Engine Inside WordPress Without Canvas or WebGL. Here's Why Designing Website Analytics for AI Crawlers Without Surveillance Forget Usernames and Passwords: A Web2 Developer’s Guide to Solana Identity Usage-Based Billing for AI Agents with FastAPI and Kong 30 Days of AI Agents Buying From a Real WooCommerce Store. Here's What the Data Says. AWS - Identity and Access Management Explained for Beginners Token Saving, and Caveman How Superpowers Forces Skill Execution
Audit-trail-by-construction: a thesis for spec-driven AI coding
Masroor Ahma · 2026-05-27 · via DEV Community

Audit-trail-by-construction: a thesis for spec-driven AI coding

TL;DR. Trail is a multi-agent framework for Claude Code that uses Plane work-items as the audit bus. Requirements get stable IDs that thread all the way down into test-code annotations, so every line of AI-generated code can be traced back to a signed-off intent. Built for regulated work and security-critical systems, not for general velocity-first coding.

Most agentic frameworks for coding are built for velocity. They wire up some agents — a planner, an architect, a coder, a reviewer — and let them collaborate on a feature. What comes out is code, often working code, in less time than a human would need.

That is fine, until you have to defend the code.

A regulator asks: who signed off on the threat model that justifies this auth shortcut? A customer asks: which acceptance criterion does this test actually prove? An incident review asks: when this requirement got added, what was the original intent — was the implementation true to it, or did the agent improvise? In a velocity-first framework, the trail goes cold quickly. The agent did it. A dev approved the PR. The "why" lives in a chat transcript that got compacted twice and was partly summarised.

Trail is a multi-agent framework that takes the opposite bet: discipline first, velocity second. The thesis: for software you eventually have to defend — regulated industries, security-critical systems, anything that gets reviewed by an auditor — the audit trail is not an afterthought. It is the primitive.

The closest cousin to this approach is BMAD-METHOD, which makes the same bet on partitioning AI agents by SDLC role under explicit human direction. The load-bearing difference is where the collaboration bus lives: BMAD uses Git plus markdown files in the repo, while Trail uses Plane work-items with one ticket-system account per persona — which is what makes the identity attribution mechanically enforced rather than merely by convention.

The discipline, in three rules

These three rules already carry most of the weight.

Description-once. A requirement is written once into a ticket body and then never edited again. Refinements travel as comments. No version-skew on what was actually agreed.

Stable per-criterion IDs. Every success criterion gets a SC-N. Every acceptance criterion gets an AC-N.M. Edge cases get EC-N.M.x. Non-functional requirements get NFR-N. Architectural invariants live in a Control Manifest as CM-N. These IDs are append-only — once they have been issued, they never move.

Per-role identity in the ticket bus. Each agent persona has its own ticket-system account, and writes are attributed accordingly. The board is the audit log: open it, scan a column, see which named role designed, reviewed, implemented, or tested every change.

The IDs are the connective tissue. They thread from the Business Analyst's intent down through the Software Architect's slices into the implementor's test code — and they stay legible whether you read them forward (intent → code) or backward (code → why).

How it threads, visually

Diagram: threading

The Business Analyst writes a Story body once: "Customer places an order", with two success criteria — SC-1 (the customer receives a confirmation) and SC-2 (the order is visible in the customer's account). The Requirements Engineer then adds a comment that refines SC-1 into testable acceptance criteria — AC-1.1 (email arrives within 60 seconds), AC-1.2 (the order carries a unique and stable number) — plus an edge case EC-1.1.a for the payment-provider timeout. None of this overwrites anything; it is all append.

When the Backend Developer implements, every test carries an inline comment that names the upstream ID it satisfies: // AC-1.1, // EC-1.1.a. A grep for // AC- in the codebase enumerates the acceptance criteria that already have proof. A grep for AC-1.1 traces a single criterion from BA intent down to the line of code that proves it.

That is the audit trail you can show to anyone — auditor, customer, incident reviewer — without having to interpret it. The IDs do not need an explanation; the chain is the explanation.

How a feature flows

Diagram: flow

Ten persona subagents collaborate through Plane work-items. (Plane is an open-source ticket system — think of Jira's mental model on self-hostable infrastructure.) The lifecycle is a state spine — Backlog → To Do → In Progress → In Review → Done — and at every transition a human pulls the trigger. There is no ticket-driven autopilot. The user issues a slash command (/ba, /re, /sa, /sr, /bd, /ud, /tm, /tw, /rm); the framework loads the persona's role into Claude Code's main loop for that turn; the persona writes the artefact, transitions the state, hands the work-item to the next named role, and gives control back.

This is by design, not by accident. The slash-command rhythm forces the human to read the ticket body, the comments, and the current state before triggering the next persona. It removes the temptation to wave everything through with one global "OK". Every turn is a deliberate hand-off — one that the user has to actually engage with before deciding to accept, reject, or send back for rework.

The handover is structural. BA hands to RE. RE hands to SA. SA cuts the work into 1–4 sub-work-items, each in exactly one module (frontend / backend / testing / documentation), and hands them to SR. SR then posts security-review comments and hands back to USER, who afterwards dispatches each sub-work-item to its module's implementor. The implementors write code, post Implementation notes, and put the sub-work-item into In Review. USER closes — or reassigns for rework.

Every transition leaves a fingerprint in Plane: a state change, an assignee change, a comment, a commenter. Nothing of it is interpreted. All of it is queryable.

What it costs

Spec-driven development has a known weakness, and the framework does not paper over it. At the start of a Story, you can never cover every use case and every eventuality — every spec is a snapshot of what the author understood at that moment. Reality finds the gaps later.

Because the description-once rule is taken seriously, you do not go back and re-edit the Story body once those gaps surface. You cut a follow-up Story instead. Each follow-up carries its own SC/AC/EC IDs, its own audit chain, its own state spine. That is intentional — it keeps every signed-off intent immutable — but it also means that one feature can fan out into three or four tickets over its lifetime as edge cases turn up. The board grows. Operators should expect Story-fanout, not Story-condensation.

The other cost is throughput. The slash-command rhythm and the per-turn engagement both slow things down considerably compared to a velocity-first framework. That is the entire point — but you should be honest with yourself about whether the trade-off makes sense for what you are building.

Try it — and an honest caveat

The framework lives at github.com/mahmadhuebsch/trail-aiac. The shortest path:

git clone https://github.com/mahmadhuebsch/trail-aiac
cd trail-aiac
claude
> /trail-install-helper

Enter fullscreen mode Exit fullscreen mode

The install-helper is a meta-agent that walks you through three scenarios — greenfield (Ansible provisions a Plane host for you), existing Plane without agents, existing Plane with agents already provisioned — and lands a working consumer project with the ten personas wired in.

One operational note. The framework assumes a Claude Max 5x subscription as the practical ceiling. That is roughly the level at which a human can still read every ticket the agents are producing. If you find yourself burning through significantly more, you are not really reviewing any more — you are vibe-coding. No human can process that much input consciously, which defeats the entire point of the human-in-the-loop discipline.

Honest caveat: this is not for every team. Most teams do not need that much rigour — they need velocity, and they should pick a velocity-first framework. Trail is for the cases where someone, eventually, will ask you to defend your code: regulated industries, security-critical systems, agencies whose deliverables get reviewed by auditors. In those settings, the discipline is not overhead. It is the only thing that makes AI-generated code defendable.

Trail v0.1.0 is early beta. PRs and design feedback at github.com/mahmadhuebsch/trail-aiac/issues.


Author's note: this article was drafted with Claude — the same agent runtime that the framework wraps — and edited by hand from there. The thesis, the worked example, the trade-offs section, and the vibe-coding caveat are mine; structure and phrasing had AI assistance throughout. Given the topic, disclosure felt appropriate.