Spec-Driven Development: Structure Beats Vibes

Key Takeaways

Spec-driven development (SDD) makes a machine-readable specification the primary artifact; code, tests, and docs are derived from it

GitHub released Spec Kit in September 2025; by April 2026 it had over 90,000 stars and supported 20+ coding agents

66% of developers say their top AI frustration is code that's "almost right, but not quite" — the failure mode specs are designed to catch

Birgitta Boeckeler identifies three SDD maturity levels: spec-first, spec-anchored, and spec-as-source

Specs have failure modes too: Thoughtworks Radar rated SDD "Assess, not Adopt" in November 2025 and Marmelab documented a 1,300-line spec for a one-feature date display

45% of AI-generated code samples introduced OWASP Top 10 vulnerabilities across 100+ tested models (Cloud Security Alliance, April 2026). 66% of developers say their top AI frustration is output that's "almost right, but not quite" (Stack Overflow 2025 Developer Survey). The models keep improving. The failure mode hasn't changed.

The first time I tried to vibe code a billing dashboard for my SaaS, Claude Code burned 40 minutes producing three different layouts that all looked plausible and all missed the auth boundary. I closed the chat, wrote a one-page PRD — goals, non-goals, the four tables it touched, the two roles that read it — and pasted it back. Fifteen minutes later the dashboard was right on the first try. Specs aren't waterfall. They're the difference between three rewrites and one.

The gap is the spec. Spec-driven development closes it by making the specification — not the prompt, not the code — the source of truth your tools and agents build from.

What Is Spec-Driven Development?

Wikipedia's definition is the cleanest: "Spec-driven development is a software engineering methodology where a formal, machine-readable specification serves as the primary artifact from which implementation, testing, and documentation are derived" (Wikipedia, 2026).

The practitioner framing from GitHub's Den Delimarsky is more operational: "Instead of coding first and writing docs later, in spec-driven development, you start with a spec. This is a contract for how your code should behave and becomes the source of truth your tools and AI agents use to generate, test, and validate code" (GitHub Blog, September 2, 2025).

Both definitions share one idea: the spec is upstream of everything. Code is a compilation target. Tests are a consistency check. Documentation is a projection. The spec is what you author, review, and version.

The Term Is Older Than It Looks

Spec-driven development didn't arrive with AI. Wikipedia traces it to 1960s NASA workflows and a formal academic treatment by Ostroff, Makalsky, and Paige at the XP 2004 conference. Formal methods, contract programming, and model-driven engineering all sit in the same lineage. What changed in 2025 is that large language models made the cost of "write the spec first" collapse: the spec itself can be drafted, refined, and turned into code by the same agent, as long as the spec is the artifact everyone argues about.

The Problem Vibe Coding Created

Vibe coding made it possible to describe a feature in plain English and get working code back in seconds. That's the upside. The downside shows up at scale, and the data from the last twelve months is unambiguous.

A Veracode study cited in the Cloud Security Alliance's April 4, 2026 research note found 45% of AI-generated code introduced OWASP Top 10 vulnerabilities across 100+ tested LLMs; Java samples failed 72% of the time, and 88% were vulnerable to log injection (CSA Research Note). Apiiro's enterprise telemetry in the same note showed AI-assisted developers produced commits at 3–4x the rate of peers, while security findings rose roughly tenfold and privilege-escalation paths climbed 322% over six months.

Productivity data is just as stark. A July 2025 METR randomized controlled trial found experienced open-source developers were 19% slower when using AI coding tools, despite predicting a 24% speedup (METR RCT, July 2025). The Stack Overflow 2025 Developer Survey (n = 48,945) found 84% of developers use or plan to use AI, but only 33% trust AI accuracy while 46% actively distrust it.

The "almost right" tax

66% of developers cite "AI solutions that are almost right, but not quite" as their top AI frustration (Stack Overflow 2025). Debugging plausible-looking wrong code is often slower than writing it yourself. Specs exist to prevent "almost right" from ever leaving the planning phase.

The pattern is consistent: AI writes fast, generates superficially plausible code, and leaves you to clean up architectural drift and security gaps. The Stack Overflow team connected the dots explicitly in their 2025 write-up, calling out "spec-driven development" by name as the structural response. I covered the full scaling picture in Vibe Coding Has a Scaling Problem.

How Spec-Driven Development Works

GitHub's Spec Kit is the clearest reference implementation. It formalizes a four-phase workflow every spec-driven project moves through, and the phases work whether you're using Claude Code, Cursor, Copilot, Gemini CLI, or any of the 20+ other agents Spec Kit targets.

The Four Phases

Constitution. Project-wide invariants. Your stack, your conventions, the things every feature inherits. This is the document every downstream spec references.
Specify. A feature-level spec: goals, non-goals, constraints, acceptance criteria. This is what the agent reads before it starts planning.
Plan. The agent decomposes the spec into architectural decisions and task breakdowns, then hands the plan back for human review.
Tasks / Implement. Only now does code get written. Each task traces back to an acceptance criterion in the spec, which means divergence is visible rather than silent.

An optional Clarify phase sits between Specify and Plan; the agent asks the questions a human reviewer would ask before committing to an approach. The Spec Kit repo is open source, MIT-licensed, and sat at roughly 90,000 stars with active v0.7.x releases as of April 2026 (github.com/github/spec-kit).

The Three Maturity Levels

Birgitta Boeckeler's October 2025 article on martinfowler.com breaks spec-driven development into three ascending levels of commitment (Boeckeler, October 2025):

Spec-first. You write a spec before prompting. The spec informs the AI but isn't regenerated as code changes. Simplest, lightest, most teams start here.
Spec-anchored. Spec and code stay in sync. When code drifts, the spec is updated; when the spec changes, code is regenerated. This is where Spec Kit and Amazon Kiro live.
Spec-as-source. The spec is the only thing humans author. Code is fully derived output, closer to how Terraform generates infrastructure from HCL. Tessl Framework is the most public example.

Most teams don't need level three. Moving from unstructured prompting to spec-first captures most of the reliability gain.

Spec-Driven Development vs. Vibe Coding

Spec-driven development doesn't replace vibe coding; it constrains it. The two answer different questions at different points in the workflow.

	Vibe Coding	Spec-Driven Development
Primary artifact	The prompt	The specification
Source of truth	Generated code	The spec
Best for	Exploration, prototypes, UI tweaks	Anything touching auth, payments, data
Failure mode	Pattern drift, "almost right" output	Over-specification, review overload
Iteration loop	Re-prompt until code works	Revise spec, regenerate code
Review target	Generated code diff	Spec diff first, then code diff

The healthy version of the two is layered: vibe-code inside a well-written spec. The spec bounds what the AI is allowed to do; the prompt fills in the how. When the output drifts, you fix the spec, not the prompt.

Context Engineering — The Layer Below Specs

A spec tells the AI what to build. Context engineering tells it what it already knows. The term was coined in parallel by Shopify CEO Tobi Lütke and Andrej Karpathy in late June 2025, within two days of each other.

Context engineering is the delicate art and science of filling the context window with just the right information for the next step. — Andrej Karpathy, June 25, 2025

Lütke's framing, two days earlier, was more practical: "the art of providing all the context for the task to be plausibly solvable by the LLM" (@tobi on X, June 23, 2025). Simon Willison collected both quotes and argued the term better reflects what production LLM work actually looks like (Willison, June 27, 2025).

The relationship to specs is directional: context engineering feeds the spec, and the spec feeds the task. A spec with no context produces code that's technically correct but violates every convention in your repo. A context without a spec produces code that fits the repo but does the wrong thing. You need both.

I treat them as two of three layers in a structured vibe coding framework — context engineering, AI coding guardrails, and spec-driven workflows — that together form a complete harness. Specs without context, or context without enforcement, fail in predictable ways.

The Tools Shipping Spec-Driven Workflows

Three tools define the current state of spec-driven development. Each takes a different position on the Boeckeler maturity ladder.

GitHub Spec Kit. Open source, MIT-licensed, roughly 90,000 stars as of April 2026. Supports Claude Code, Copilot, Cursor CLI, Gemini CLI, Codex CLI, Qwen, opencode, and more. Lives at the spec-anchored level: specs and code evolve together through the Constitution/Specify/Plan/Tasks flow.
Amazon Kiro. Commercial AWS offering, same spec-anchored tier. Kiro emphasizes tight AWS integration and specification reuse across services.
Tessl Framework. Commercial, the most aggressive of the three. Pushes toward spec-as-source: humans author specs, everything else is generated. Thoughtworks' Technology Radar flagged all three by name when it placed spec-driven development in its "Assess" ring in November 2025 (Thoughtworks Radar Vol. 33).

The tools handle generation. They don't handle enforcement. That's where harness engineering picks up — the tests, type checks, and quality gates that verify the generated code actually matches the spec. Specs and harnesses are complements: the spec is what you wanted, the harness proves you got it.

When Spec-Driven Development Backfires

Spec-driven development has a credible set of critics. Ignoring them produces the exact overhead they warn about.

François Zaninotto at Marmelab documented the most concrete example in November 2025: a single feature to display the current date required 8 files and roughly 1,300 lines of specification using Spec Kit (Marmelab, November 12, 2025). His argument is that SDD is a rebranded waterfall optimized for removing developers from the loop.

SDD is a step in the wrong direction. It tries to solve a faulty challenge: "How do we remove developers from software development?" — François Zaninotto, Marmelab

Thoughtworks' Technology Radar was more measured but still cautious, placing SDD in "Assess" rather than "Trial" or "Adopt" and warning the workflows are "elaborate and opinionated" and may represent "a bitter lesson — that handcrafting detailed rules for AI ultimately doesn't scale." Boeckeler, a qualified supporter, has flagged the same failure modes: review overload for small features and non-deterministic LLM output undermining the promised control.

The practical heuristic: spec-driven development is overhead for anything simpler than a feature spec. Use it where the cost of architectural drift is high (auth, billing, multi-tenant data, API contracts) and skip it where the cost of being wrong is a page refresh.

How to Start Without Rewriting Everything

You don't need Spec Kit, a Constitution document, or a four-phase workflow to practice spec-driven development. You need a one-page spec and the discipline to hand it to the AI before you prompt.

Write a one-page PRD before prompting. Goals, non-goals, constraints, acceptance criteria. Fifteen minutes. This single step is the biggest reliability gain most teams will see, and it costs nothing.
Use AGENTS.md as your Constitution. Stack choices, conventions, architectural rules, forbidden patterns. Next.js 16.2 now ships AGENTS.md in create-next-app by default; I walk through a full AGENTS.md-first workflow in a step-by-step tutorial on vibeready.sh.
Treat the spec as the diff target. When the AI produces something wrong, revise the spec first, then regenerate the code. Don't re-prompt your way around a spec gap — that's the vibe-coding failure mode.
Pair the spec with a harness. Specs without automated tests and type checks drift silently. The spec says what you want; the harness proves the code matches. Harness engineering is the enforcement layer.
Graduate to Spec Kit when the overhead earns itself. Once you have a handful of features that share a Constitution, formalizing with Spec Kit or Kiro starts paying back. Before that, a directory of markdown specs works fine.

The spec is the upstream half of this. The downstream half is a harness — tests, type checks, lint rules — that catches when the AI ignored the spec. I keep both layered: spec defines intent, harness verifies execution.

The point of spec-driven development isn't specs. It's getting AI to build the thing you actually wanted, the first time, at the architectural level your future self will have to maintain. A one-page PRD beats a four-hour debugging session. Every time.

Originally published on VibeReady. Republished here for the dev.to community.

推荐订阅源

DEV Community