AI dropped my per-feature ship time from 3 days to 3 hours. Here's the actual stack.

I keep getting the same DM:

"Cool, but does AI actually speed up shipping or is this just hype?"

So here's the table from one MVP build that ended last quarter. Numbers measured, not vibed.

Per-feature time, with and without agents

Activity	Traditional senior team	With agentic SDLC	Speedup
Plan a feature (ARCH doc + tasks)	2–4h human discussion	15 min (architect agent + `gate:plan`)	~10×
Code a small feature	1–3 days senior dev	1–2h human review of agent output	~10–15×
Code review	2–4h, async over 1–2 days	30 min (5 reviewers in parallel)	~10×
QA / test suite	1 day	15 min (qa-engineer agent + spot check)	~25×
Deploy (canary + monitoring)	~4h	~10 min (auto-canary)	~25×
End-to-end per feature	~3–5 days	~3–5 hours	~10×

Shipping one feature drops from "we'll have it next week" to "we'll have it after lunch." For a real working developer, that's the metric that matters more than any "55% cost reduction" headline.

The full MVP picture

OK, but a single-feature speedup doesn't necessarily mean the MVP ships faster. Sometimes you just spend the saving on more reviews. So here's the end-to-end:

Work area	Traditional (1 PM + 4 eng, ~3 months)	With agents + voice-pack (1 PM + 2 eng + agents, ~6–8 weeks)
Architecture + ADRs	~$20K	~$10K
Backend (Twilio, OpenAI, call routing)	~$80K	~$30K
Frontend (operator dashboard)	~$40K	~$15K
Database + migrations	~$15K	~$5K
Test suite + QA	~$25K	~$10K
Security review + pen test	~$20K	~$15K (external pen test still required)
Compliance (voice-pack)	~$42K	~$22K
Deployment + CI/CD	~$15K	~$8K
Documentation	~$10K	~$3K
PM + buffer	~$20K	~$10K
Total	~$287K	~$128K
LLM compute	$0	~$500–$1,500
Wall-clock	~3 months	~6–8 weeks
Headcount	1 PM + 4 engineers	1 PM + 2 engineers + agents

Cost saving: ~55%. Time saving: ~40–50%. Headcount: 4 → 2 (not 0).

Two important honest details for working devs:

LLM cost across the whole MVP is $500–$1,500. That's not a few cents – it's four-figure money burned across architecture drafting, code generation, parallel reviewers, deployment automation, and the memory feedback loop. Don't compare a single agent prompt to the full build.
You still need engineers. "2 engineers + agents" means real humans operating the pipeline, reviewing agent output, fixing the bugs agents create, integrating Twilio (or whatever), and shipping the code. The startup that ships an MVP with zero humans in 2026 doesn't exist.

What is "the agents" actually doing?

This is the part where most posts wave hands. The reality: thirty-four specialist agents, eight stages, two human gates per feature. Architecture diagram here: greatcto.systems/architecture – every box on the SVG is clickable to that agent's source on GitHub.

Daily-driver agents you'll see fire most:

architect – drafts ARCH.md + ADR + cost estimate, before gate:plan
pm – decomposes into beads tasks with explicit dependencies, parallel-friendly
senior-dev (×N) – claims a task, TDD, isolated worktree, ships diff
qa-engineer – type-check + lint + tests + coverage
security-officer – OWASP, CVE scan, secret detection
code-reviewer – 12-angle review on the final diff
devops – canary + health checks + auto-rollback
l3-support – production triage + postmortem
continuous-learner – extracts lessons → .great_cto/lessons.md

Plus 26 archetype-specific reviewers that fire only when their domain triggers – voice-AI, healthcare, fintech, robotics, etc. The point isn't 34 always-on agents. The point is 5–7 fire on any given PR, and which 7 depend on what your repo looks like.

The compliance packs (10 of them)

If you ship into a regulated industry, agentic SDLC alone isn't enough – you also need the right reviewer agents to know which gates to wire. Hence: packs.

A pack triggers on industry signals in your repo (e.g. twilio in package.json → voice). It attaches a specialist reviewer agent, generates a threat model, and wires named human gates. One-line each:

voice-pack – twilio, livekit, deepgram, elevenlabs → TCPA + state recording consent + STIR/SHAKEN + PCI redaction
clinical-pack – clinical, PHI, SaMD, CDS → FDA SaMD classification + HIPAA + 21 CFR Part 11
hr-ai-pack – recruit, candidate, ATS → NYC LL 144 AEDT bias audit + EEOC + EU AI Act Annex III
api-platform-pack – REST, GraphQL, webhook, OpenAPI → OAuth 2.1 + RFC 8594 Sunset + HMAC webhook signing + idempotency
lending-pack – loan, BNPL, credit, FCRA, ECOA → ECOA Reg B adverse-action + BISG fair-lending + NMLS state matrix
clinical-trials-pack – CTMS, EDC, eConsent, FHIR, HL7 → ICH-GCP + Part 11 audit trail + CDISC + IRB-ready
robotics-pack – cobot, ROS 2, surgical robot → ISO 10218 + IEC 61508 + HARA + SROS2
em-fintech-pack – RBI, CBN, BSP, UPI, PIX, M-Pesa → India DPDP + cross-border + license strategy
climate-pack – Verra, Gold Standard, Scope 1/2/3, CDP, CSRD → MRV methodology + biosecurity
drug-discovery-pack – binding affinity, ADMET, AlphaFold, LIMS, GLP → applicability domain + IQ/OQ/PQ + ALCOA+

Each pack adds 1–4 reviewer agents, named human gates, eval fixtures, and a required-artefact list. Full breakdown with company catalogues at greatcto.systems/packs.

How detection works (the part HN readers will ask)

{
  name: 'voice-pack',
  signals: {
    deps:    ['twilio', '@livekit/agents', 'deepgram-sdk'],
    keywords: ['voice agent', 'IVR', 'phone tree'],
    files:    ['twilio.config.*', 'livekit.yaml'],
  },
  attaches: {
    archetypes: ['ai-system', 'agent-product'],
    reviewer:   'voice-ai-reviewer',
    gates:      ['gate:voice-compliance'],
  }
}

Exact-match keyword scanning, not fuzzy substring. 'twilio' matches 'twilio' in dependencies, not 'twilio-helpers' in README. Keeps false-positive overlay attachment under 1%.

Confession on that 1%: v0.1 did fuzzy substring matching and voice-pack triggered on a static-site-generator repo whose README said "we explicitly do not use Twilio." Spent an hour wondering why a blog generator was getting a TCPA threat model. Also, I shipped voice-pack without 'phone' in the keyword list for two weeks. Two startups installed it, shipped voice features, the pack sat there politely without firing once. The boilerplate every new pack now starts from has a rule: include the most obvious keyword first, not last.

Packs stack additively. twilio + stripe + livekit → voice-pack + commerce-pack. If two packs name the same gate, the kernel dedupes by name. Reviewers run in parallel on the same PR; verdicts aggregate to one APPROVED / BLOCKED chip at gate:ship.

Source: skills/great_cto/packs/, packages/cli/src/packs.ts.

Install + try

npx great-cto init

Runs locally. MIT. Pay your own LLM API. Works inside Claude Code, Cursor, OpenAI Codex CLI, Aider, and Continue via AGENTS.md + MCP.

After init:

/start "add a voice agent for restaurant order-taking"

Architect agent drafts ARCH doc. PM decomposes into beads tasks. gate:plan waits for your approval. Then senior-dev agents claim tasks in parallel; 5 reviewer agents fan out on the resulting diff; gate:ship waits for your approval again. Two clicks per feature. The rest runs unattended.

What does NOT speed up

The honest disclaimer because it matters more than the speedup headline:

External audit cycles still take their natural time (LL 144 auditor ~2-4 weeks, FDA pre-sub 60-90 days)
IRB approval still takes 2-3 months
Regulator meetings still need to be scheduled
Wet-lab validation is still real biology
HARA signoff is a single calendar moment a human owns

Anything requiring another organization to commit time runs at human speed. The LLM accelerates your codebase and your compliance discovery. It doesn't accelerate someone else's calendar.

TL;DR

Per-feature time drops ~10× (3–5 days → 3–5 hours). MVP wall-clock drops ~40–50% (3 months → 6–8 weeks). Cost drops ~55%.
LLM cost across the WHOLE MVP is $500–$1,500. Not free, not trivially cheap.
Headcount drops 4 → 2 engineers + agents. Not 0. You still need humans.
10 compliance packs cover voice-AI, clinical, HR-AI, API platforms, lending, clinical trials, robotics, EM fintech, climate-MRV, drug discovery.
Architecture diagram: greatcto.systems/architecture. One real run walked stage-by-stage: greatcto.systems/proof. MTTR benchmark methodology: docs/benchmarks/MTTR.md.
Try: npx great-cto init. ⭐ if useful: github.com/avelikiy/great_cto.

Full deep-dive with per-pack details + the realistic MVP economics breakdown + the runway math is on Hashnode: Ten compliance packs for ten regulated industries.

推荐订阅源

DEV Community