惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

www.infosecurity-magazine.com
www.infosecurity-magazine.com
Security Archives - TechRepublic
Security Archives - TechRepublic
TaoSecurity Blog
TaoSecurity Blog
Cloudbric
Cloudbric
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
N
News and Events Feed by Topic
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
S
Securelist
The Cloudflare Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
D
DataBreaches.Net
S
Schneier on Security
L
LangChain Blog
Jina AI
Jina AI
M
MIT News - Artificial intelligence
Recent Announcements
Recent Announcements
T
Tenable Blog
B
Blog RSS Feed
V
Visual Studio Blog
Simon Willison's Weblog
Simon Willison's Weblog
G
Google Developers Blog
T
The Exploit Database - CXSecurity.com
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
WordPress大学
WordPress大学
W
WeLiveSecurity
I
InfoQ
The Hacker News
The Hacker News
雷峰网
雷峰网
月光博客
月光博客
P
Privacy & Cybersecurity Law Blog
O
OpenAI News
Hacker News: Ask HN
Hacker News: Ask HN
T
Threat Research - Cisco Blogs
GbyAI
GbyAI
The Last Watchdog
The Last Watchdog
P
Privacy International News Feed
Cyberwarzone
Cyberwarzone
S
SegmentFault 最新的问题
L
Lohrmann on Cybersecurity
人人都是产品经理
人人都是产品经理
V
V2EX
V
Vulnerabilities – Threatpost
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Cybersecurity and Infrastructure Security Agency CISA
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
T
Troy Hunt's Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
阮一峰的网络日志
阮一峰的网络日志
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Spec-Driven Development: Structure Beats Vibes
Remy B. · 2026-05-12 · via DEV Community

Key Takeaways

  • Spec-driven development (SDD) makes a machine-readable specification the primary artifact; code, tests, and docs are derived from it
  • GitHub released Spec Kit in September 2025; by April 2026 it had over 90,000 stars and supported 20+ coding agents
  • 66% of developers say their top AI frustration is code that's "almost right, but not quite" — the failure mode specs are designed to catch
  • Birgitta Boeckeler identifies three SDD maturity levels: spec-first, spec-anchored, and spec-as-source
  • Specs have failure modes too: Thoughtworks Radar rated SDD "Assess, not Adopt" in November 2025 and Marmelab documented a 1,300-line spec for a one-feature date display

45% of AI-generated code samples introduced OWASP Top 10 vulnerabilities across 100+ tested models (Cloud Security Alliance, April 2026). 66% of developers say their top AI frustration is output that's "almost right, but not quite" (Stack Overflow 2025 Developer Survey). The models keep improving. The failure mode hasn't changed.

The first time I tried to vibe code a billing dashboard for my SaaS, Claude Code burned 40 minutes producing three different layouts that all looked plausible and all missed the auth boundary. I closed the chat, wrote a one-page PRD — goals, non-goals, the four tables it touched, the two roles that read it — and pasted it back. Fifteen minutes later the dashboard was right on the first try. Specs aren't waterfall. They're the difference between three rewrites and one.

The gap is the spec. Spec-driven development closes it by making the specification — not the prompt, not the code — the source of truth your tools and agents build from.

What Is Spec-Driven Development?

Wikipedia's definition is the cleanest: "Spec-driven development is a software engineering methodology where a formal, machine-readable specification serves as the primary artifact from which implementation, testing, and documentation are derived" (Wikipedia, 2026).

The practitioner framing from GitHub's Den Delimarsky is more operational: "Instead of coding first and writing docs later, in spec-driven development, you start with a spec. This is a contract for how your code should behave and becomes the source of truth your tools and AI agents use to generate, test, and validate code" (GitHub Blog, September 2, 2025).

Both definitions share one idea: the spec is upstream of everything. Code is a compilation target. Tests are a consistency check. Documentation is a projection. The spec is what you author, review, and version.

The Term Is Older Than It Looks

Spec-driven development didn't arrive with AI. Wikipedia traces it to 1960s NASA workflows and a formal academic treatment by Ostroff, Makalsky, and Paige at the XP 2004 conference. Formal methods, contract programming, and model-driven engineering all sit in the same lineage. What changed in 2025 is that large language models made the cost of "write the spec first" collapse: the spec itself can be drafted, refined, and turned into code by the same agent, as long as the spec is the artifact everyone argues about.

The Problem Vibe Coding Created

Vibe coding made it possible to describe a feature in plain English and get working code back in seconds. That's the upside. The downside shows up at scale, and the data from the last twelve months is unambiguous.

A Veracode study cited in the Cloud Security Alliance's April 4, 2026 research note found 45% of AI-generated code introduced OWASP Top 10 vulnerabilities across 100+ tested LLMs; Java samples failed 72% of the time, and 88% were vulnerable to log injection (CSA Research Note). Apiiro's enterprise telemetry in the same note showed AI-assisted developers produced commits at 3–4x the rate of peers, while security findings rose roughly tenfold and privilege-escalation paths climbed 322% over six months.

Productivity data is just as stark. A July 2025 METR randomized controlled trial found experienced open-source developers were 19% slower when using AI coding tools, despite predicting a 24% speedup (METR RCT, July 2025). The Stack Overflow 2025 Developer Survey (n = 48,945) found 84% of developers use or plan to use AI, but only 33% trust AI accuracy while 46% actively distrust it.

The "almost right" tax

66% of developers cite "AI solutions that are almost right, but not quite" as their top AI frustration (Stack Overflow 2025). Debugging plausible-looking wrong code is often slower than writing it yourself. Specs exist to prevent "almost right" from ever leaving the planning phase.

The pattern is consistent: AI writes fast, generates superficially plausible code, and leaves you to clean up architectural drift and security gaps. The Stack Overflow team connected the dots explicitly in their 2025 write-up, calling out "spec-driven development" by name as the structural response. I covered the full scaling picture in Vibe Coding Has a Scaling Problem.

How Spec-Driven Development Works

GitHub's Spec Kit is the clearest reference implementation. It formalizes a four-phase workflow every spec-driven project moves through, and the phases work whether you're using Claude Code, Cursor, Copilot, Gemini CLI, or any of the 20+ other agents Spec Kit targets.

The Four Phases

  1. Constitution. Project-wide invariants. Your stack, your conventions, the things every feature inherits. This is the document every downstream spec references.
  2. Specify. A feature-level spec: goals, non-goals, constraints, acceptance criteria. This is what the agent reads before it starts planning.
  3. Plan. The agent decomposes the spec into architectural decisions and task breakdowns, then hands the plan back for human review.
  4. Tasks / Implement. Only now does code get written. Each task traces back to an acceptance criterion in the spec, which means divergence is visible rather than silent.

An optional Clarify phase sits between Specify and Plan; the agent asks the questions a human reviewer would ask before committing to an approach. The Spec Kit repo is open source, MIT-licensed, and sat at roughly 90,000 stars with active v0.7.x releases as of April 2026 (github.com/github/spec-kit).

The Three Maturity Levels

Birgitta Boeckeler's October 2025 article on martinfowler.com breaks spec-driven development into three ascending levels of commitment (Boeckeler, October 2025):

  • Spec-first. You write a spec before prompting. The spec informs the AI but isn't regenerated as code changes. Simplest, lightest, most teams start here.
  • Spec-anchored. Spec and code stay in sync. When code drifts, the spec is updated; when the spec changes, code is regenerated. This is where Spec Kit and Amazon Kiro live.
  • Spec-as-source. The spec is the only thing humans author. Code is fully derived output, closer to how Terraform generates infrastructure from HCL. Tessl Framework is the most public example.

Most teams don't need level three. Moving from unstructured prompting to spec-first captures most of the reliability gain.

Spec-Driven Development vs. Vibe Coding

Spec-driven development doesn't replace vibe coding; it constrains it. The two answer different questions at different points in the workflow.

Vibe Coding Spec-Driven Development
Primary artifact The prompt The specification
Source of truth Generated code The spec
Best for Exploration, prototypes, UI tweaks Anything touching auth, payments, data
Failure mode Pattern drift, "almost right" output Over-specification, review overload
Iteration loop Re-prompt until code works Revise spec, regenerate code
Review target Generated code diff Spec diff first, then code diff

The healthy version of the two is layered: vibe-code inside a well-written spec. The spec bounds what the AI is allowed to do; the prompt fills in the how. When the output drifts, you fix the spec, not the prompt.

Context Engineering — The Layer Below Specs

A spec tells the AI what to build. Context engineering tells it what it already knows. The term was coined in parallel by Shopify CEO Tobi Lütke and Andrej Karpathy in late June 2025, within two days of each other.

Context engineering is the delicate art and science of filling the context window with just the right information for the next step. — Andrej Karpathy, June 25, 2025

Lütke's framing, two days earlier, was more practical: "the art of providing all the context for the task to be plausibly solvable by the LLM" (@tobi on X, June 23, 2025). Simon Willison collected both quotes and argued the term better reflects what production LLM work actually looks like (Willison, June 27, 2025).

The relationship to specs is directional: context engineering feeds the spec, and the spec feeds the task. A spec with no context produces code that's technically correct but violates every convention in your repo. A context without a spec produces code that fits the repo but does the wrong thing. You need both.

I treat them as two of three layers in a structured vibe coding framework — context engineering, AI coding guardrails, and spec-driven workflows — that together form a complete harness. Specs without context, or context without enforcement, fail in predictable ways.

The Tools Shipping Spec-Driven Workflows

Three tools define the current state of spec-driven development. Each takes a different position on the Boeckeler maturity ladder.

  • GitHub Spec Kit. Open source, MIT-licensed, roughly 90,000 stars as of April 2026. Supports Claude Code, Copilot, Cursor CLI, Gemini CLI, Codex CLI, Qwen, opencode, and more. Lives at the spec-anchored level: specs and code evolve together through the Constitution/Specify/Plan/Tasks flow.
  • Amazon Kiro. Commercial AWS offering, same spec-anchored tier. Kiro emphasizes tight AWS integration and specification reuse across services.
  • Tessl Framework. Commercial, the most aggressive of the three. Pushes toward spec-as-source: humans author specs, everything else is generated. Thoughtworks' Technology Radar flagged all three by name when it placed spec-driven development in its "Assess" ring in November 2025 (Thoughtworks Radar Vol. 33).

The tools handle generation. They don't handle enforcement. That's where harness engineering picks up — the tests, type checks, and quality gates that verify the generated code actually matches the spec. Specs and harnesses are complements: the spec is what you wanted, the harness proves you got it.

When Spec-Driven Development Backfires

Spec-driven development has a credible set of critics. Ignoring them produces the exact overhead they warn about.

François Zaninotto at Marmelab documented the most concrete example in November 2025: a single feature to display the current date required 8 files and roughly 1,300 lines of specification using Spec Kit (Marmelab, November 12, 2025). His argument is that SDD is a rebranded waterfall optimized for removing developers from the loop.

SDD is a step in the wrong direction. It tries to solve a faulty challenge: "How do we remove developers from software development?" — François Zaninotto, Marmelab

Thoughtworks' Technology Radar was more measured but still cautious, placing SDD in "Assess" rather than "Trial" or "Adopt" and warning the workflows are "elaborate and opinionated" and may represent "a bitter lesson — that handcrafting detailed rules for AI ultimately doesn't scale." Boeckeler, a qualified supporter, has flagged the same failure modes: review overload for small features and non-deterministic LLM output undermining the promised control.

The practical heuristic: spec-driven development is overhead for anything simpler than a feature spec. Use it where the cost of architectural drift is high (auth, billing, multi-tenant data, API contracts) and skip it where the cost of being wrong is a page refresh.

How to Start Without Rewriting Everything

You don't need Spec Kit, a Constitution document, or a four-phase workflow to practice spec-driven development. You need a one-page spec and the discipline to hand it to the AI before you prompt.

  1. Write a one-page PRD before prompting. Goals, non-goals, constraints, acceptance criteria. Fifteen minutes. This single step is the biggest reliability gain most teams will see, and it costs nothing.
  2. Use AGENTS.md as your Constitution. Stack choices, conventions, architectural rules, forbidden patterns. Next.js 16.2 now ships AGENTS.md in create-next-app by default; I walk through a full AGENTS.md-first workflow in a step-by-step tutorial on vibeready.sh.
  3. Treat the spec as the diff target. When the AI produces something wrong, revise the spec first, then regenerate the code. Don't re-prompt your way around a spec gap — that's the vibe-coding failure mode.
  4. Pair the spec with a harness. Specs without automated tests and type checks drift silently. The spec says what you want; the harness proves the code matches. Harness engineering is the enforcement layer.
  5. Graduate to Spec Kit when the overhead earns itself. Once you have a handful of features that share a Constitution, formalizing with Spec Kit or Kiro starts paying back. Before that, a directory of markdown specs works fine.

The spec is the upstream half of this. The downstream half is a harness — tests, type checks, lint rules — that catches when the AI ignored the spec. I keep both layered: spec defines intent, harness verifies execution.

The point of spec-driven development isn't specs. It's getting AI to build the thing you actually wanted, the first time, at the architectural level your future self will have to maintain. A one-page PRD beats a four-hour debugging session. Every time.

Originally published on VibeReady. Republished here for the dev.to community.