惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Jina AI
Jina AI
NISL@THU
NISL@THU
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
GbyAI
GbyAI
SecWiki News
SecWiki News
Microsoft Azure Blog
Microsoft Azure Blog
J
Java Code Geeks
B
Blog RSS Feed
Blog — PlanetScale
Blog — PlanetScale
Schneier on Security
Schneier on Security
V
Vulnerabilities – Threatpost
C
CXSECURITY Database RSS Feed - CXSecurity.com
V
Visual Studio Blog
宝玉的分享
宝玉的分享
Recent Announcements
Recent Announcements
T
True Tiger Recordings
F
Full Disclosure
Martin Fowler
Martin Fowler
D
Docker
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
A
About on SuperTechFans
雷峰网
雷峰网
Know Your Adversary
Know Your Adversary
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Hacker News: Ask HN
Hacker News: Ask HN
B
Blog
V
V2EX - 技术
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google DeepMind News
Google DeepMind News
S
Security Archives - TechRepublic
Google DeepMind News
Google DeepMind News
人人都是产品经理
人人都是产品经理
Malwarebytes
Malwarebytes
C
Check Point Blog
美团技术团队
P
Privacy International News Feed
Recorded Future
Recorded Future
博客园 - 司徒正美
T
The Blog of Author Tim Ferriss
L
LangChain Blog
Project Zero
Project Zero
P
Proofpoint News Feed
有赞技术团队
有赞技术团队
P
Proofpoint News Feed
Scott Helme
Scott Helme
C
CERT Recently Published Vulnerability Notes
云风的 BLOG
云风的 BLOG
T
ThreatConnect
F
Fox-IT International blog

DEV Community

I Thought Coding Was The Job Beginning to market Why Your Treasure Hunt Engine Kept Crashing at 1.2M Concurrent Connections Introducing Batch Processing for ZeroGPU Kiln Crisis Management: Controlling Irregular Raw Meal in CCR Using Python Optimizing a High-Throughput Browser-Based Box Shadow Generator: Debounced State Updates and Chunked File Readers I Was Spending $3,200/Month on GPT. Then I Tried Chinese Models. Why You Must Stop Pasting Production Payloads into Web Decoders: Building a Secure Base64 Decode Strategy Message Brokers Comparison 2026 — Kafka, RabbitMQ, NATS & Redis Streams: Which One Should You Choose? Your Git Tree Looks Like a Crime Scene: How to Write Commits That Don’t Suck I tried every popular library for programmatic PDF form filling. None of them survived production The const enum that took down our payments Architecture of Chaos Part 3 — Event Sourcing Saved Our Audit Trail, Then a Fiber Cable Broke Stop Paying Per Cert. It's Crazy. Building Embeddable Browser Games for Website Engagement Build a Privacy-First Tampermonkey Script for Long ChatGPT Conversations XSS Attacks Are Everywhere: Reflected, Stored, DOM-Based — How to Actually Fix Them (2026) Stop letting LLMs hallucinate dates — a tool for AI agents The Platform Team Became a Finance Team /align v0.8 — personal evals for Claude Code, maintained by an LLM agent Copilot helped me deploy my passion project to the App Store Software Engineering: The Art of Thinking Out Loud (with AI) Leaked Kubernetes Secrets: Impact Assessment and Mitigation Strategies First 90 days as a junior engineer on an AI-heavy team: what to learn first Something Honest About Being a Developer on This Kind of Team JSON Schema Validator Advanced Techniques for Power Users I Built Hermes Immune System — A Safety Lab for AI Agents Google I/O 2026: MCP Is Now Infrastructure (Spark, Managed Agents, WebMCP & More) Probabilistic Graph Neural Inference for deep-sea exploration habitat design for extreme data sparsity scenarios QuantConnect Review: Running 2,400 Backtests Without Installing a Single Python Library The Complete Guide to Video APIs in 2026 (And Why Your Choice of Tool Actually Matters) Alpha Vantage vs Yahoo Finance API: Free Market Data for Side Projects — An Honest Comparison Day 20 of 60: I Built a Production-Grade Authentication System with JWT Tokens and API Key Managemen Nobody on the internet knows if you are a human The fastest way to optimize images for your web projects (Zero Server Roundtrips) We Got Burned by Veltrix Configuration Layer and Lived to Tell the Story Why Block Handed Goose to the Linux Foundation: Agentic AI Goes Open The Delve Scandal Proved SOC 2 Is Broken — Here's What Micro-SaaS Founders Should Do Instead OpenTelemetry: The Foundation of Modern Cloud-Native Observability — Traces, Metrics, Logs, and the Future of Observability Arc Browser Review: 18 Months With a Browser That Thinks Differently [Boost] Docker healthchecks: what they actually measure and what you shouldn't promise Docker healthchecks: qué miden de verdad y qué no deberías prometer I Built an AI That Roasts Cold Emails — Here's What 18,000 Drafts Taught Me Are You My Parent?: Scaffolding in the architecture necessary for keyboard handling between components. The AI Labs Found Product-Market Fit in April How I Stopped Fighting AI Context: JetBrains AI vs. Copilot in Rider I Accidentally force-pushed to main at 11 PM — So I Built an Interactive Git Undo Tool Perplexity Spaces vs You.com vs Phind: which AI search fits your dev research workflow I'm 14, can't code, and built a cognitive state app in one day — here's what happened Three Cloudflare Patterns Earned the Hard Way Aider Review: The Open-Source AI Pair Programmer That Works With Any LLM How to Measure and Improve Core Web Vitals in Under 30 Minutes Standardizing Feature Flags Is Easy to Agree On. Migrating Safely Is the Hard Part. What if UI tests validated user experience instead of selectors? Why I Stopped Believing 'Best Practices' and Started Trusting 'Works For Us' PrestaShop Doctrine: Automatically Manage the DB Prefix PrestaShop Enterprise vs Shopify Plus A .NET Dinosaur in Web3 — Day 15: DAO Voting Halyra IDE Wearable App Development Cost: How to Build a Quality MVP Without Overspending New in Vue - May 2026 427 Remote Companies Using TypeScript in 2026 MCP CI gates need receipts: tools/list is not enough 📖 DICTIONARIES IN PYTHON: THE SMART DATA VAULT I Generated a Tableau Dashboard Using Gemma 4 — Locally, No API Key, No Cloud The Hidden Way Electronics Can Start a Fire — Even Without an Open Flame I Built a Beginner-Friendly NGINX Automation CLI for Linux Servers Vibe Thinking - The PM Who Writes Requirements That an AI Can Actually Use A Refreshing Perspective on AI and Truth Kubelet Metrics: How cAdvisor and CRI Collect Kubernetes Stats How to Optimize MongoDB on Bare Metal Servers: SRE Playbook Why I Built Bamise Instead of Using Laravel How to Build a Clean Academic Dataset Without Losing Your Mind (or Your Weekend) Kubernetes Is Eating Your Budget: How to Fix EKS Over-Provisioning What Awnings Taught Me About Developer Experience Tree Traversal: Why the Order You Pick Is a Data Flow Decision I built my own forum using PHP- it came out great Optimizing Chunking and Data Extraction for Zero-Hallucination RAG Controlling Blender with AI — Building an MCP Server for 3D Creation 5 Smart Contract Vulnerabilities Every Developer Should Know in 2026 Cursor users who write failing tests before prompting the AI complete features in 37% fewer iterations than those who pr When AI Becomes a Danger: 370,000 Grok Conversations Exposed I Refactored 100 Functions With Claude. CI Was Green. Production Got Slower in 7 Spots. I read my own commits like a stranger Child Safety vs. Data Center Dollars The Reason Your AI Chatbot Feels Fast Has Nothing to Do With a Better Model Beyond Vibe-Coding What I learned testing AI translation tools in 2026 (DeepL is still good, but LLMs caught up) AWS ECS Fargate Cost Allocation: Why Your Per-Cluster Spend Shows as One Line How to Surface License Violations in GitHub Advanced Security with feluda We Deleted 10 Real Users with a Test-Cleanup Script — RCA The Decision Subtraction Framework: How to Evaluate Any AI Tool How I Access My Home PC From Anywhere Without Spending a Penny # agents.md: Teaching AI Agents How to Scrape (The Future of Web Automation) KAI vs Global vs Tojiro vs Miyabi: How to Actually Tell Japanese Knife Brands Apart Why We Accidentally Blocked Our Users: A Deep Dive into Idempotency in Distributed Systems I Connected Hermes Agent to a Live MCP Server with 59 Tools and Here's What It Actually Built Our first app is finally live on the Play Store after 4 months of hard work 🚀 I Built UUIDs That Look Random But Sort Like Timestamps (50% Smaller Indexes!)
The Grilling
Andrey Kuche · 2026-05-28 · via DEV Community

In Part 1 I argued that every spec-driven AI framework on the market - sixteen of them in my survey - has the same structural blind spot. They all check the implementation against the spec. None of them check the spec against attack before it gets written.

Part 2 is the operational deep dive.

What does the missing phase actually look like when you build it?
How does it run?
What are the agents, the prompts, the termination conditions, the artifacts?
When should you not use it?

This part assumes you’ve read Part 1, or at least bought the premise: the spec needs to be on trial before it becomes gospel.

Why “multi-agent debate” isn’t enough

A few research papers and a couple of frameworks have something they call multi-agent debate. Two agents argue, a third synthesizes. This is a real technique with real research behind it, and it’s a meaningful improvement over single-agent reasoning.

It is not Grilling.

The differences matter, and they’re worth being precise about.

The first difference is grounding.

Most debate setups in current frameworks operate on whatever’s in the prompt - they don’t first survey the codebase, the existing tests, the past failures, the applicable constraints. The result is two LLMs hallucinating at each other politely.

The Advocate invents objections that don’t apply to the actual system; the Proposer defends positions against attacks that wouldn’t matter even if they landed. Without a Recon Dossier in front of both agents, the debate is theater. It produces dialogue, not decisions.

Grilling refuses to start until the ground truth is established and verified. That’s not a stylistic choice - it’s the only way the attacks have weight.

The second difference is the optimization target.

Standard debate optimizes for the best version of a chosen position. Two agents start with opposing views and the synthesizer extracts what’s strongest from each. This is genuinely useful when you’ve already decided to do something and you’re trying to figure out the best way.

Grilling optimizes for a different thing entirely: whether the position should be held at all. The Proposer isn’t defending a position because it was assigned to them; they’re proposing a solution they actually think is correct, and the Advocate is trying to dismantle that proposal.

The legitimate output of Grilling is kill the idea entirely. The legitimate output of standard debate is rarely neither side has a point.

The third difference is the stopping condition.

And this might be the most important one. Standard debate ends when both sides have made their case - typically after a fixed number of rounds, or when the orchestrator decides the discussion has matured. That’s a procedural ending, not a substantive one. The debate stops because the schedule says it stops, not because the question has been resolved.

Grilling has a structural stopping condition: equilibrium between two opposing pressures. The attacker has nothing left. The Don has nothing left. Both pressures simultaneously exhausted. Until that condition is met, the rounds continue (up to the hard ceiling). After that condition is met, no more rounds - they’d add nothing.

The stopping condition is the whole game.

If your debate stops on we’re done arguing, you’re polishing turds - you exit with whatever the agents converged on, regardless of whether what they converged on was correct.

If it stops on no new valid objection AND no remaining concerns, you have something stronger: a verdict that survived attack, with the surviving objections explicitly logged.

Multi-agent debate is a useful tool. It’s just a different tool, solving a different problem.

How a Grilling session is structured

Grilling sits as Phase 2 of the Heist Pipeline. It’s not a prompt and it’s not a standalone tool - it’s a phase with hard gates before it (Reconnaissance must complete and produce a Recon Dossier) and after it (the Don must sign off on the verdict before anything moves to the Sit-Down).

Where Grilling sits — Phase 2 of 6 in the Heist Pipeline

The process runs like a structured interrogation. Three subagents have specific roles, the Don (the user) participates in every round, and the rounds follow a fixed order.

The Proposer opens. It reads the Recon Dossier - the verified findings from Phase 1 - and proposes a solution: architecture, file changes, identified risks, expected behavior.

The Proposer’s job is to put the strongest possible version of the idea on the table. Not the safest version, not the most diplomatic version. The strongest. If the idea is bad, you want it to die fighting, not die mumbling.

The Devil’s Advocate attacks. Architectural flaws. Security gaps. Constitution violations. Performance regressions. Scalability ceilings. Edge cases the Proposer didn’t think about. The Devil’s Advocate’s job - and this is important - is to find the failure mode.

Not to be polite.
Not to suggest improvements.
To attack.

If the Proposer says “we’ll cache this in Redis,” the Devil’s Advocate says:

What happens when Redis is down?
What happens when the cache is poisoned?
Have you measured the actual cache hit rate or are you guessing?

Bad attacks get filtered by the Proposer’s response.
Good attacks force a revision.

The Don - that’s the user, you - weighs in every round. One question at a time. Never bundled. This rule matters more than it sounds.

If the Don asks three questions at once, the agent will answer the easy one fully, the medium one partially, and quietly skip the hard one. One question forces an actual answer. The Don’s questions are usually the most valuable in the whole Grilling, because the Don has context the agents don’t have - about the team, about the business, about the politics, about what’s been tried before that didn’t make it into the codebase.

The Synthesizer closes each round. It incorporates the valid attacks and the Don’s feedback and produces a revised solution. Not a defense of the original - a revision. If nothing valid came up that round, the revision is small. If something hit hard, the revision is structural. Sometimes the revision is kill this idea entirely and propose a different approach, and that’s a legitimate outcome.

Then the next round begins.

The Grilling — round structure, hard limits, and termination conditions.

What a round actually looks like

Theory is cheap.

Here’s a real Round 1, lightly edited for length, from a Grilling session on a small feature: adding a local high-score leaderboard to a browser Tetris game.

The task sounds trivial. Watch how fast “trivial” falls apart under attack.

The Proposer opens. It reads the Recon Dossier - a three-layer pure state machine, zero DOM in the logic layer, 261 passing tests, a constitution whose first commandment is game logic has zero DOM dependencies — and puts a solution on the table:

Add a name-entry status to the state machine. Store the typed initials in state.nameEntry = { buffer: '', maxLen: 3 }. Detect game-over by reading the gameOver event from state.events[], then route keystrokes into the buffer. Render the leaderboard as a DOM <div> overlay, mirroring the existing settings.js. A pure isTopTen(score, entries) function decides qualification.

Reasonable on its face. Mirrors existing patterns. Names real files. This is the strongest version, not the safest — exactly what the Proposer is supposed to deliver.

The Devil’s Advocate attacks. It doesn’t suggest improvements. It goes for failure modes, citing the Dossier as ammunition:

Attack 1state.nameEntry violates Constitution Commandment 1. The name buffer is pure UI interaction state. It has zero effect on scoring, physics, or piece movement. Putting it in state.js pollutes the logic layer that the entire test suite depends on being DOM-free. It belongs in main.js as local orchestration state. Evidence: state.js is the logic layer; 261 tests import it headless.

Attack 2state.events[] is the wrong trigger. The Dossier describes that array as existing “for renderer/audio coupling.” It does not specify whether it’s drained per-frame or persistent. If the renderer or audio handler consumes it before main.jsreads it, the game-over trigger is silently missed. That’s a race condition waiting to happen. Evidence: Dossier“Events array for renderer/audio coupling,” no drain semantics specified.

Attack 3key suppression is unspecified. S (settings), M (mute), P (pause), R (restart), L (leaderboard) all have live bindings in main.js. Type a name containing “S” and the settings panel opens mid-entry. The Proposer flags this as a risk but offers no mechanism. Evidence: existing main.js key handlers.

Plus a secondary finding the Proposer never mentioned: names rendered from localStorage straight into the DOM are a stored XSS vector unless the write uses textContent instead of innerHTML.

Verdict: CHALLENGE. The idea is sound. The solution has three holes that need closing before anyone writes a spec.

The Don weighs in — one question at a time, never bundled.

Do you agree with the attack? → Agree fully.

Anything the Devil’s Advocate missed? → Nothing missed.

Want to override any part of the proposal before synthesis? → No override.

Three forced answers. No place to hide an evasive “well, sort of.”

The Synthesizer revises. Not a defense of the original - a revision that absorbs every valid hit:

Remove nameEntry from state.js entirely. The buffer becomes let nameBuffer = '' local to main.js. The state machine keeps only the 'name-entry' status string — that genuinely is game state, because it gates input routing and rendering.

Abandon state.events[] for orchestration. main.jstracks prevStatus across ticks; when status flips to game-over, the transition fires. No array drain, no race. The events array keeps its existing renderer/audio role untouched.

Hard early-return guard at the top of the keydown handler: while status === 'name-entry', all keys route to handleNameEntryKey() and nothing else can fire.

All DOM writes use textContent. XSS closed.

One round.

The “trivial” feature now has a clean separation between UI and logic state, a race condition designed out before it existed, and a security hole closed before a line of code was written. That’s the difference between we agreed and it survived attack.

This particular session ran a second round — the Devil’s Advocate came back with six more attacks (no max-length cap on the buffer, a pure-vs-DOM module split, JSON schema validation on load) — before reaching equilibrium. The full verdict, with every rejected alternative and every accepted risk, flowed straight into the Sit-Down as the basis for the Contract.

Hard rules on rounds.

Minimum 2 rounds - you can’t grill an idea in a single pass, because the first attack is always shallow. Default maximum 5 rounds - most ideas resolve here, either by surviving or by transforming into something different.

Hard ceiling 7 - the Don can extend, but not beyond, because past round 7 returns diminish sharply and you’re usually just rationalizing at that point. Early exit only after round 2, only by explicit Don call - used when convergence is genuinely fast and continuing would be theater.

Termination is not “we agreed.”

Agreement is the easiest thing in the world to manufacture between an LLM and another LLM, and between an LLM and a tired user. Termination is one of three structural conditions:

Nash Equilibrium is the canonical one. The Devil’s Advocate raises no new valid objection AND the Don has no remaining concerns. Both attacking pressures have run out of ammunition simultaneously. The idea has genuinely survived attack - not because the attack stopped, but because the attack hit nothing that wasn’t already
accounted for. This is the outcome you want.

Explicit consensus is the fast-path. The Don ends the Grilling after round 2, declaring that the idea has been adequately tested. This is appropriate when the problem is genuinely simple, when the Recon Dossier already addressed the major risks, or when the team has high confidence from prior similar work. It’s a real exit, but it’s the Don’s call to make, not the agents’.

Round limit is the safety valve. If neither equilibrium nor consensus is reached by round 7, the Grilling ends - but the unresolved objections don’t disappear. They get logged into the verdict as accepted risks. The Don is explicitly carrying them forward.

This matters: it means a forced termination doesn’t pretend the idea is clean. It just makes the dirt explicit. Six months later, when something breaks, you can look at the verdict and see exactly which risk was knowingly accepted.

The output of Grilling isn’t a spec. It’s a verdict - Key Decisions (and why), Rejected Alternatives (and why they were rejected), Unresolved Objections (and what risks the Don is carrying forward), and the Termination Reason.

The verdict is held in-context, not written to a file - it flows directly into the next phase, the Sit-Down, where the actual Contract gets drafted.

Only after Grilling does anything get written down as a Contract.
Only after the Contract gets signed does code get planned.
Only after the plan does code get written.

Five gates before a single line of implementation. That sounds heavy, and on small tasks it is - which is why Grilling has explicit “skip” conditions, which I’ll get to below.

But for anything that’s actually load-bearing, the cost of skipping any of those gates is higher than the cost of running them. Always. The point of the pipeline is that the friction is real friction, not theatrical friction. It catches things.

Yes, this adds tokens. Recon plus Grilling cost real money on every feature, and on a moderate-sized change the overhead is non-trivial - I’ll publish hard numbers from instrumented runs separately. The bet is that the cost of arguing about a bad idea is always smaller than the cost of building one. So far that bet has held.

When NOT to use Grilling

I’m not going to pretend this is universal. It isn’t. Grilling is a serious tool with serious overhead, and it has clear failure modes when applied wrong.

The first failure mode is using Grilling on changes that don’t deserve it. If the task is fixing a typo, bumping a dependency version, or renaming a variable - Recon plus 2 rounds of Grilling is absurd. You’ll spend more tokens debating the change than implementing it, and the agents will start manufacturing fake objections to fill the rounds because there genuinely aren’t real ones to raise.

The Devil’s Advocate will say something like have we considered backwards compatibility for users who depend on this exact variable name? and you’ll know the system has descended into theater.

The second failure mode is using Grilling on pure refactors with a verified baseline. If the existing code already works, the tests already pass, and the goal is to clean up structure without changing behavior - the original decision was already grilled (or should have been) when the original code was written.

Re-grilling at refactor time is litigating a settled question. The right thing in that case is a different gate: a behavior-preservation check, not a should-this-exist check.

The third failure mode is using Grilling during exploratory prototyping, where the entire point is to fail fast and learn. If you’re spiking out three different approaches to see which one is even tractable, you don’t want each spike to get a full adversarial review - you want to throw cheap code at the problem and see what survives contact with reality. Grilling here actively kills the exploration.

The fourth failure mode is using Grilling under genuine time pressure when the cost of being wrong is small. Production is on fire, the fix is small, you’re confident in the diagnosis, and the cost of an extra hour of debate is real customer pain. Skip it.
Document what you did. If the fix turns out to be wrong, that’s what the Ledger is for - you log the failure and feed it into Reconnaissance for next time.

So when should you grill?

  • Use Grilling for new features that touch architectural decisions - anything where the structural shape of the change matters, not just its correctness.

  • Use Grilling for changes that introduce a new dependency, a new external integration, a new data model - these are the changes where the cost of getting it wrong propagates for years.

  • Use Grilling for security-relevant changes, where the failure mode is we shipped a vulnerability - the Devil’s Advocate role is genuinely valuable here, because security failures are exactly the failures that careful, well-meaning people miss.

  • Use Grilling any time the cost of building the wrong thing is meaningfully larger than the cost of arguing about it for an hour.

The decision rule is brutal but simple: how much will it cost to undo this if you’re wrong? If the answer is more than the Grilling itself, grill it. If the answer is less, don’t.

The hard part isn’t applying the rule.
The hard part is being honest about which side of the rule a given task falls on.

Most engineers underestimate the cost of being wrong, because the cost is mostly invisible - it shows up later, in the form of technical debt, integration headaches, security audits that find old shortcuts, and refactors that take months to unwind.

Grilling is the moment you pay that cost up front, in tokens and minutes, instead of paying it later in engineer-years.

The uncomfortable implication

If your framework doesn’t have a Grilling phase, your framework is a productivity tool for shipping bad ideas faster.

That’s a real product. There’s a market for it. Plenty of people want their bad idea shipped quickly and don’t want to be told it’s bad. Fine. Ship it. Sell it.

To be fair, most existing frameworks aren’t claiming to do this - they’re claiming to enforce rigor in implementation, and they do that genuinely well.

Spec-Kit, MUSUBI, Tessl, the rest - within their scope, they’re honest about what they offer. The problem is the gap between what they offer and what users think they’re getting.

If you read the marketing, “spec-driven” sounds like the spec is the source of truth. It isn’t. The spec is just the input that the rigor machinery operates on. The spec itself was never on trial.

The next generation of AI frameworks won’t be the ones with more agents, longer context, or fancier orchestration. It’ll be the ones brave enough to tell the user no before writing a single line of spec.

That’s the bar. Almost everyone is below it. The hole is right there in the middle of every framework, and we’re all stepping around it pretending it isn’t there.

Stop pretending.

Recon the ground. Grill the idea. Kill the bad ones. Build the survivors.

That’s the whole job.

Wrapping the series

Part 1 mapped the landscape and named the gap. Part 2 showed what filling the gap actually looks like - the agents, the rounds, the termination conditions, the failure modes.

The next pieces in this series will go deeper into the rest of the Heist Pipeline: the Sit-Down (where the Contract gets signed), Resource Development (where the plan gets built), the Hit (where code finally gets written), and Laundering (where everything gets verified and logged into the Ledger).

Each of these phases has the same general design philosophy - explicit gates, named artifacts, no phase skippable - but they solve
different problems.

If you build agentic systems and you’ve felt the productivity-tool-shipping-bad-ideas-faster problem, Gangsta Agents is open source. It’s a young project (first stable release in April 2026, v1.1.1). Issues, PRs, and adversarial critique of the framework itself are all welcome. Especially the last one - it would be embarrassing to ship a framework about Grilling without grilling the framework.

← Part 1: Sixteen Frameworks. One Blind Spot.

Gangsta Agents is an open-source agentic framework built around a 6-phase Heist Pipeline: Reconnaissance → Grilling → Sit-Down → Resource Development → The Hit → Laundering. Every phase has a gate. No phase is skipped.

github.com/kucherenko/gangsta

gangsta.page