惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

D
DataBreaches.Net
T
Threatpost
N
News and Events Feed by Topic
PCI Perspectives
PCI Perspectives
V2EX - 技术
V2EX - 技术
D
Docker
G
Google Developers Blog
Microsoft Security Blog
Microsoft Security Blog
N
News and Events Feed by Topic
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Google Online Security Blog
Google Online Security Blog
The GitHub Blog
The GitHub Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
Y
Y Combinator Blog
M
MIT News - Artificial intelligence
Blog — PlanetScale
Blog — PlanetScale
博客园 - 司徒正美
T
Troy Hunt's Blog
Webroot Blog
Webroot Blog
Security Archives - TechRepublic
Security Archives - TechRepublic
量子位
Apple Machine Learning Research
Apple Machine Learning Research
H
Help Net Security
F
Full Disclosure
B
Blog
O
OpenAI News
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园_首页
Google DeepMind News
Google DeepMind News
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Engineering at Meta
Engineering at Meta
大猫的无限游戏
大猫的无限游戏
Forbes - Security
Forbes - Security
Know Your Adversary
Know Your Adversary
B
Blog RSS Feed
MongoDB | Blog
MongoDB | Blog
Scott Helme
Scott Helme
T
The Exploit Database - CXSecurity.com
博客园 - 聂微东
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
The Last Watchdog
The Last Watchdog
Recorded Future
Recorded Future
IT之家
IT之家
Project Zero
Project Zero
Stack Overflow Blog
Stack Overflow Blog
小众软件
小众软件
Attack and Defense Labs
Attack and Defense Labs
L
Lohrmann on Cybersecurity
SecWiki News
SecWiki News
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com

Show HN

CSP Radar GitHub - awebai/aweb-team-coord-worktrees: An aweb team template for a minimum team with a permanent coordinator and worktrees with local developers. GitHub - fujibee/agmsg GitHub - lucastononro/notify: 100% local, free, offline attention skill for Claude Code: plays a sound and speaks a short status update when a long task finishes, blocks, or needs a decision. GitHub - sebastianwessel/skills: AI Skills tivatdoar / workout-to-work · GitLab GitHub - enumura1/py-sql-cleaner: Find, format, and safely extract embedded SQL from Python files. GitHub - intent-bench/intent-bench: Intent fulfillment benchmark for agentic AI engineering GitHub - steveking-gh/firmion: Firmion is DSL and engine for firmware image generation. GitHub - villagesql/villagesql-skills: Agent skills for VillageSQL - gemini-cli-extension; claude-code-plugin GitHub - 0gsd/enough: a personal language system for planning, writing, and translation. GitHub - Kaelio/ktx: ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer GitHub - ThatXliner/xtras: Xliner's Claude Code Skills GitHub - flightdeckhq/flightdeck: Observability and control plane for AI agents. GitHub - search-router/simple-search: Open-source reference app on top of the Search Router API: FastAPI + Jinja metasearch service with pluggable backends, deterministic mocks (no API key needed), RTL UI, Redis cache, and a demo ads cabinet. CSP Radar GitHub - Light-Heart-Labs/DreamServer: Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. GitHub - Diplomat-ai/diplomat-agent-ts: What can your TypeScript AI agent do to the real world? Scan your code. See which tool calls have zero checks Code Block Selector - Visual Studio Marketplace Prometheus dependency graph — interactive showcase | Riftmap Show HN: I made a vi-like modal keyboard plugin for Figma GitHub - run-llama/liteparse: A fast, helpful, and open-source document parser GitHub - dalemyers/Roar: A macOS CLI tool for notifications GitHub - district-solutions/open-agent-tools-coder: Enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,900+ (2+ TB) popular github repos using large and small ai models to create reuseable: json, markdown and parquet files for local-first tool-calling models. GitHub - progapandist/stripeek: A local TUI proxy for real-time Stripe API debugging, built for navigating complex payloads fast. GitHub - sir1st/hermes-desktop: All-in-one cross-platform desktop app for Hermes Agent — bundles Python + hermes-agent + hermes-web-ui GitHub - astefanutti/shaderbang: Shebang for Shaders Show HN: Generate Claude Code Workflows using Spec Driven Development approach GitHub - nixys/nxs-universal-chart: The Helm chart you can use to install any of your applications into Kubernetes/OpenShift Show HN: AI agents for UK GDAD PCF roles and their skills The Two Pillars: Mixer Mode and Meta-Software in the Reorganization of Software Work After AI GitHub - JaiCode08/teleport-env What 1,000+ Harness Experiments Taught Me About Self-Improving Agents Show HN: Liiists, a Markdown-first, iOS and CLI list app SwiperTab – Get this Extension for 🦊 Firefox (en-US) GitHub - kouhxp/fftext: Summarize, explain, fact-check, or translate any text, URL, or file. No GPU. No cloud. One command GitHub - sweetpad-dev/sweetpad: Develop Swift/iOS projects using VSCode GitHub - dogmaticdev/IRON: IRON a.k.a. Intermediate Representation Object Notation is a Interpreter/Database that is used to create Programming Languages. GitHub - sjhalani7/vaen: Package your AI coding harness into a portable .agent file, and share it across repos, teams, & the community without ever having to copy-paste instructions, skills, MCP config, or secrets. Show HN: Gandalf the Grader Show HN: Citadeld – replay any CI failure locally from a single file GitHub - tdortman/cuSBF: High-Performance GPU Super Bloom Filter coral-ai/claude-code-token-xray at main · Coral-Bricks-AI/coral-ai GitHub - ulyssestenn/funes: Funes is a Git-based framework for LLM-managed knowledge work: an AI Librarian ingests raw sources, builds an interlinked Markdown knowledge base, and uses it to produce cited reports, analyses, and other outputs. GitHub - ThatXliner/gah: Git Add Hunk, built for agents to use GitHub - harmont-dev/harmont-cli: Command-line client for the Harmont CI platform GitHub - brooksmcmillin/mcp-authflow: OAuth 2.0 Authorization Server framework for MCP servers GitHub - javaid-codes/audit-supply-chain-agents GitHub - amorey/gochan: A small library of common channel architectures for Go, inspired by Rust GitHub - arifozgun/OpenGem: Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility in 1 file GitHub - Pranesh950/BioPetals: 🌸 Run BIOxAI models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading GitHub - cnguyen14/bounty-doctor: Diagnose a GitHub bounty issue before you waste hours: detects honeypot scam repos, AI-bot attempt swarms, and stale contests. Show HN: CoreMCP – MCP Server for On-Prem DBs Show HN: KittyHTML – Render HTML/CSS as an inline image in your terminal GitHub - bingud/filemat: Web-based file manager Show HN: TruthLens – Free multi-signal deepfake image detector GitHub - apexlocal-jz/claude-usage-tray: Windows system-tray app showing your Claude Code rate-limit usage at a glance. Zero deps, ~300 lines of PowerShell. Cross-IDE (works regardless of VS Code, Cursor, plain terminal). Release v0.1.2.1 · kouhxp/yapsnap GitHub - noopolis/moltnet: Self-hostable chat network for AI agents. Pre-built bridges for Claude Code, Codex, and the Claws. Rooms, DMs, history. No Slack bots, no Matrix, no glue code. GitHub - tamerh/enju: Coordinating Humans, AI Agents, and Compute as Peers on a Shared Workflow Graph Show HN: Continuity-auth – Respect-weighted rate limits for the open web GitHub - luml-ai/luml: AI lifecycle platform where engineers and agents track experiments, train models, and ship to production. GitHub - mrdanielcasper/CoreTex: A UNIX-inspired, biomimetic, flat-file AI harness and knowledge engine. GitHub - clemg/pierre-github: Pierre's diffs.com and trees.software for Github GitHub - lyriks-io/unspaghettit: Behavior-driven AI development without prompt spaghetti. GitHub - sofumel/claude-handoff-revive: Resume Claude Code work after rate/usage/context limits without replaying the prior transcript. Auto-saves at 90%/95% usage. Plugin-installable, 10 languages. GitHub - dotexorg/saferpc: Typed, end-to-end encrypted RPC over any bidirectional channel. GitHub - BeeZeeAgent/beezee: Agent harness orchestration Legato Next.js Boilerplate for Internal Tools · CoreUI GitHub - clark-labs-inc/clark-hash: Clark Hash, 32x smaller searchable sketches for embeddings GitHub - ZeroPointRepo/youtube-mcp: The fastest YouTube transcript + YouTube search MCP for AI agents. Try for free. Typing Mastery — climb toward 100+ WPM, deliberately GitHub - Andebugulin/Awareen GitHub - fayzan123/claude-workflow-composer: Visual desktop app for composing multi-agent coding workflows. Drag agents, attach skills and MCPs, wire handoffs, export to .claude/ GitHub - StackOneHQ/stack-nudge We hardened an LLM agent. Each defense we added made it more exploitable. GitHub - alkait/WhatsKept: Agent-queryable WhatsApp history from an iOS backup — a single Go binary. GitHub - octelium/cordium: Open-source, general-purpose sandbox platform for devs and AI agents that provides identity-based secure access to infrastructure without credentials. GitHub - scosman/videowright: Build animated explainer videos with your coding agent GitHub - dipankar/dscode: The code editor you can take apart. GitHub - zoharbabin/web-researcher-mcp: MCP server (Go) for AI assistants: web search, content extraction, academic/patent/news research. Multi-provider routing, 4-tier scraping, search lenses. Works with Claude, Cursor, and any MCP client. GitHub - scanaislop/aislop: Catch the slop AI coding agents leave in your code: narrative comments, swallowed exceptions, as-any casts, dead code, oversized functions. 50+ rules across 7 languages (TypeScript, JavaScript, Python, Go, Rust, Ruby, PHP). Sub-second, deterministic, no LLM at runtime. MIT-licensed. GitHub - kouhxp/cheap-im: CPU-only voice agent approximating Thinking Machines' Interaction Models demo GitHub - unprovable/OrchidMantis: Orchid Mantis — standalone framework for Zero-Knowledge Proofs of eXploit (ZKPoX). GitHub - TangibleResearch/Halgorithem: A Algo designed to detect AI Hallucitions GitHub - CarpseDeam/Aura-IDE: An AI coding harness that shaped itself - Planner/Worker agents, repo awareness, surgical edits, validation, recovery, and safe diff approvals. GitHub - chojs23/concord: A feature-rich TUI client for Discord GitHub - aerf-spec/aerf: Agent Evidence Receipt Format (AERF) — an open specification for tamper-evident, independently verifiable records of AI agent actions. GitHub - Jwrede/tokentoll: Catch LLM cost changes in code review. Infracost for LLM spend. GitHub - samchon/ttsc: A `typescript-go` toolchain for compiler-powered plugins and type-safe execution + 500x faster lint integrated into compiler GitHub - Higangssh/homebutler: 🏠 Manage your homelab from chat. Single binary, zero dependencies. GitHub - olalie/tapmap: See where your computer connects and what stands out on a live world map. GitHub - Diplomat-ai/diplomat-agent: What can your AI agent do to the real world? Scan your code. See which tool calls have zero checks GitHub - Bajusz15/beacon: Open-source agent for secure remote access, monitoring, and deploys across home-lab and self-hosted machines like Raspberry Pi, N100, or any Linux server. Open web based TTY or tunnel Home Assistant and other local services securely without opening ports. BigTech AI News - Chrome 应用商店 GitHub - vinhnx/VTCode: VT Code is an open-source coding agent with LLM-native code understanding and robust shell safety. Supports multiple LLM providers with automatic failover and efficient context management. GitHub - Lumen-Labs/brainapi2: BrainAPI is a knowledge graph–powered AI memory layer that transforms unstructured data into structured knowledge, enabling intelligent search, recommendations, and contextual memory for AI agents and applications. GitHub - familiar-software/familiar: Let AI watch you work. Familiar lets your AI update its memory, skills, and knowledge by watching your screen. make sidebar/address bar rounded corner toggleable
GitHub - AscendyProject/redteam: Adversarial agent-pair harness: one model writes code through a TDD pipeline, a second model reviews it, gated at human checkpoints. Apache-2.0.
rkdgh19 · 2026-06-19 · via Show HN

CI License: Apache 2.0 Python 3.11+ Runtime deps: 0

An adversarial agent-pair harness for shipping code with AI. One model drives a task through a pipeline (plan → implement → review); a different model reviews the work adversarially; the output is a draft PR you review before merge. The collision of two independent model perspectives is the point — automatic self-agreement is what it exists to prevent. (A single-model TDD mode that front-loads write_test → verify_test is also available — see Phases by mode.)

Status: early. redteam was built as one project's internal harness and then extracted into this standalone repo, which owns it going forward — it has driven real, merged pull requests. (Its early git history reflects that origin, including cross-repo coordination from the parent project.) APIs and layout may still move.

Quick install (Claude Code) — two commands:

/plugin marketplace add https://github.com/AscendyProject/redteam
/plugin install redteam@ascendy-redteam

Not on Claude Code? Vendor it into any repo — see Install.

What it does

Given a batch of tasks (each a short input.md brief), the orchestrator walks every task through a fixed pipeline, persisting state.json after each phase so a run is fully resumable and retrying on CHANGES_REQUESTED:

flowchart TD
    PO[plan_outcome]:::worker --> PRV[plan_review]:::rev
    PRV --> IMPL[implement]:::worker
    IMPL --> RC[review_code]:::rev
    RC -->|APPROVED| CPR[create_pr → draft PR]:::worker
    RC -->|CHANGES_REQUESTED| IMPL
    RC -. blocker persists .-> RES[rescue]:::rev
    RES --> HGR[human_gate_rescue] --> CPR
    CPR --> DONE([done]):::done

    classDef worker fill:#e3f2fd,stroke:#1976d2,color:#0d47a1;
    classDef rev fill:#fce4ec,stroke:#c2185b,color:#880e4f;
    classDef done fill:#e8f5e9,stroke:#388e3c,color:#1b5e20;
Loading

Blue = worker model (writes) · pink = reviewer model (adversarial, fresh).

This is the default agent-pair flow. By design it runs with no human gates in the common path — the adversarial pair plus verification is the trust, and the output is a draft PR (your existing human checkpoint before merge), not an auto-merge. Human gates are something you add back for risky changes, not the default tax on every change — see When to use it.

Phases by mode

mode (agent-pair by default, or tdd) decides which phases run. The authority is _phase_order() in orchestrator.py (AGENT_PAIR_PHASE_ORDER / TDD_PHASE_ORDER) — driving the pipeline manually must follow the row for the declared mode, not the prose:

Mode Core phases
agent-pair (default) plan_outcome → plan_review → implement → review_code → create_pr
tdd plan_outcome → write_test → verify_test → implement → review_code → create_pr

The agent-pair worker writes its tests inside implement — there is no separate test-authoring phase; the second perspective is the adversarial reviewer (review_code), and the plan is independently checked by plan_review. The TDD mode instead drops plan_review and front-loads a write_test → verify_test pair before implement. So write_test / verify_test (the test-author / test-verifier sub-agents) run in TDD mode only — inserting them into an agent-pair task runs a phase the mode excludes.

(The table shows the worker + reviewer phases. A rescue slot is entered only if a blocker persists across review rounds — in the untiered default a rescue is then human-reviewed (human_gate_rescue) before the PR. A plan-approval gate is opt-in per tier profile.)

Each phase is run by a focused sub-agent with its own prompt and tool scope (.claude/agents/*.md): an outcome-planner, implementer, code-security-reviewer, and pr-author — plus a test-author / test-verifier pair used only in TDD mode. The reviewer is a fresh agent that only sees the diff and the project's security checklist — it never sees the implementer's reasoning.

How it's different

A plain "two-model" setup stops at a second model takes a second look. redteam makes that separation structural and then acts on it:

  • Findings are tiered, not pass/fail. The reviewer emits findings with a severity (blocker / major / minor), and the orchestrator tracks each one across review rounds (a carry-over count) — a review is not a single thumbs up/down.
  • Persistent problems escalate on a ladder. A blocker that survives multiple rounds climbs: retry the worker → a heavier rescue pass → hand to a human (ask_user). So one rejection doesn't kill a run, and a stubborn real bug doesn't get rubber-stamped after a single retry.
  • The reviewer is blind to the writer. It's a fresh agent — and a configurably different model — that sees only the diff and the security checklist, never the implementer's reasoning, so self-justification can't cross the boundary.
  • The draft PR is the human checkpoint; the default common path has no gates. The pair plus verification is the automated trust, so the output is a draft PR you review before merge — it never autonomously merges. Blocking human gates (plan approval, etc.) are opt-in per tier for risky changes, not the default.
  • Either model on either side, zero runtime deps. See Model freedom and Install.

Model freedom

Roles bind to providers through a small adapter registry, not hardcoded calls. Today Claude and Codex can each take either side:

role providers implemented
worker (planner / implementer) claude, codex
reviewer / rescue codex, claude

You choose per role in .redteam/config.toml [models]. A reviewer value that isn't an adapter (e.g. "human") falls back to the manual flow (you paste the review and touch the sentinel). Adding another provider is one adapter file plus one registry line.

When to use it

The goal is minimal human intervention, without losing trust — the adversarial pair is the automated trust, so the common path has no human gates. But not every change needs the same weight: a typo shouldn't pay for a full agent-pair, and an auth change shouldn't ship with only the light path. So you scale the response to the risk of the change:

change response
trivial — non-behavior-changing (rename, comment, formatting) single-agent, no review
routine — small, local, reversible single-agent loop; review optional
guarded — behavior change with real blast radius (auth, storage, concurrency, public API, migrations) the adversarial pair + verification (the default)
strategic / production-critical — architectural, irreversible, or changes prod posture the pair plus human gates (and a rollback plan you require)

Tier-aware routing lets the harness apply this automatically (opt-in via config.toml). You define tier profiles as declarative toggles, and a deterministic classifier picks each task's tier:

[tiers.0]                       # trivial
review = false                  # single-agent, no adversarial pair
models = { implementer = "claude-haiku-4-5" }   # cheap model

[tiers.2]                       # guarded (a sensible default)
review = true                   # the adversarial pair; no human gate

[tiers.4]                       # production-critical
review = true
gates = ["outcome", "pr"]       # add human checkpoints back here

[tier_triggers]
"**/auth/**" = 4                # touching auth floors the task at tier 4
default = 2                     # unclassified → safe default

The binding tier is max(declared, path-triggered, default) — a task can be raised but never lowered below what its paths demand, and an unclassified task falls to the mandatory safe default. With no [tiers] section, routing is off and every task takes the default pipeline (fully backward-compatible).

Two levers also work on their own, without tiers:

  • Model per role ([models]) — a cheaper implementer for routine work, a frontier reviewer for guarded work; either provider on either side.
  • The escalation ladder — a blocker finding that survives review rounds climbs retry → rescue, concentrating effort where a problem actually persists.

Trigger globs are git-pathspec-style: * matches within a path segment, ** matches across directories (so **/auth/** matches auth/x at any depth).

Scope note: v1 path triggers match the paths a task declares in its front-matter, and tier profiles vary review/gates/models over the canonical pipeline (not arbitrary phase orders). Re-checking the real committed diff and richer profiles are tracked on issue #13.

Install

As a Claude Code plugin (recommended)

This repo doubles as a single-plugin marketplace, so two commands install it:

/plugin marketplace add https://github.com/AscendyProject/redteam
/plugin install redteam@ascendy-redteam

The HTTPS URL works everywhere, including behind firewalls that block SSH (port 22). The AscendyProject/redteam shorthand also works if you have GitHub SSH keys configured.

That registers the six sub-agents and the /redteam:* commands. Run redteam-install (also exposed as a redteam-install tool on PATH) from your project root to vendor the harness in, then use the others as needed:

/redteam:redteam-install        # vendor .redteam/ into the current repo
/redteam:redteam-new-task       # scaffold the next task-NNN dir + input.md from the template
/redteam:redteam-review         # one-shot cross-model review of the current branch diff
/redteam:redteam-config         # choose the per-role models (writer / reviewer / rescue)
/redteam:redteam-status         # show the pipeline status for a batch

Or vendor directly (any stack, no Claude Code needed)

# from a clone of this repo:
python3 .redteam/scripts/install.py /path/to/your/project

# preview first:
python3 .redteam/scripts/install.py /path/to/your/project --dry-run

Useful flags: --overwrite (refresh harness-owned files; never touches your config.toml / docs/* / batches/), --protect-config (opt-in: add Claude Code Edit/Write deny rules for .redteam/config.toml to the consumer's .claude/settings.json, add-only — the runtime pairing guard is the backstop regardless), and --check (report whether a vendored install is behind this harness version, then exit — writes nothing).

Either way it's the same vendoring model: the harness ships inside your project tree (.redteam/) because the engine resolves your repo root from its own file location. Harness-owned files (workflows/, prompts/, templates/, agent skeletons) are re-vendored on each run (--overwrite to refresh); project-owned files (config.toml, docs/*, verify.sh, your batches/) are seeded once and never overwritten.

The installer does not vendor the harness's own unit tests, so a consumer never runs (or maintains) them — your verify.sh runs your tests, not the engine's. The vendored .redteam/ engine follows the harness's own style, so exclude .redteam/ from your project's linter/formatter (e.g. ruff's extend-exclude, an eslint ignore) to avoid it flagging code you don't own.

Requirements

  • Python 3.11+ (stdlib only — zero runtime pip dependencies).
  • The model CLIs you configure, installed and authenticated: claude and/or codex.

Updating

A vendored install is a copy of the engine in your repo, so it doesn't update itself — you re-vendor when a new version ships. --overwrite refreshes only harness-owned trees (workflows/, prompts/, templates/, scripts/install.py, the six agent skeletons, and the .redteam/.redteam-version stamp); your existing project-owned files (config.toml, docs/*, verify.sh) and your task content under batches/ are never overwritten (the installer only ensures an add-only batches/.gitignore rule there, leaving your files intact).

--check compares the source side against your vendored stamp, so it's only meaningful when the source is the newer one — run it from an updated plugin (redteam-install …) or a fresh clone. Running your repo's own vendored .redteam/scripts/install.py against that same repo compares the stamp to itself, so it can't reveal an upstream release (it just echoes the vendored version, or unknown if the stamp is missing). Exit codes: 0 current/ahead · 1 outdated · 2 cannot determine. It writes nothing.

Plugin installs (Claude Code)

The plugin ships the engine and puts redteam-install on PATH, so updating is two layers — refresh the plugin first, then re-vendor the engine it carries:

/plugin marketplace update ascendy-redteam   # refresh the cached marketplace
/plugin update redteam@ascendy-redteam       # update the plugin to the latest
/plugin list                                 # confirm the new version
/reload-plugins                              # apply updated commands/agents (no restart needed)

Then re-vendor the engine into your repo and confirm. Because redteam-install self-locates the plugin's (now-updated) source, its --check meaningfully compares that against your repo's vendored stamp:

redteam-install . --check        # plugin source vs your vendored stamp: 1 = outdated
redteam-install . --overwrite    # re-vendor the new engine into .redteam/
redteam-install . --check        # expect "verdict: up-to-date."
bash .redteam/scripts/verify.sh  # your gate still passes

Direct (vendored) installs

Pull the latest of this repo (your clone), then run the clone's installer against your project so the source side is the updated one:

# from your refreshed clone of this repo:
python3 /path/to/redteam-clone/.redteam/scripts/install.py /path/to/your/project --check
python3 /path/to/redteam-clone/.redteam/scripts/install.py /path/to/your/project --overwrite

Do the update on a branch and open a PR (don't push the engine bump straight to your default branch), and keep .redteam/ excluded from your linter as in Install.

Configure

Edit .redteam/config.toml for your stack (paths, verify_command, branch_prefix, role→model), then fill the three project docs the sub-agents read:

  • .redteam/docs/project-context.md — stack + hard rules
  • .redteam/docs/security-checklist.md — the reviewer's hard lines
  • .redteam/docs/test-conventions.md — how your test suite is wired

Two complete examples to copy the shape from: examples/fastapi-like/ (Python — FastAPI + Celery + Postgres + a vector DB) and examples/nuxt-like/ (JS/TS — Nuxt 3 + Vue + Vitest).

Run

python3 .redteam/workflows/orchestrator.py new    .redteam/batches/<batch> <slug> [--title "..."]
python3 .redteam/workflows/orchestrator.py start  .redteam/batches/<batch>
python3 .redteam/workflows/orchestrator.py resume .redteam/batches/<batch>
python3 .redteam/workflows/orchestrator.py status .redteam/batches/<batch>

A batch is a directory of tasks/<task-id>/input.md briefs. new scaffolds the next task-NNN directory with a template input.md (or use /redteam:redteam-new-task); fill in the brief, then start. The orchestrator creates a per-task branch (<branch_prefix>/<task-id>), runs the pipeline, and stops at each human gate until you touch the sentinel file it names.

One-shot review (no batch). To run just the adversarial reviewer over your current branch diff — a different provider than whoever wrote the code, read-only:

python3 .redteam/workflows/orchestrator.py review

It reviews git diff <base>...HEAD and exits 0 / 1 / 2 (approved / changes requested / reviewer failed), so it can gate CI. Exposed as /redteam:redteam-review in Claude Code. Fail-closed: it refuses if the configured reviewer would collapse to the worker's own provider (self-review).

Contributing

Issues and PRs welcome. See CONTRIBUTING.md for the dev setup and the gate (bash .redteam/scripts/verify.sh), and the Code of Conduct. The engine stays project-agnostic and stdlib-only — those two invariants drive most review feedback. To report a vulnerability, see SECURITY.md (don't open a public issue).

License

Apache License 2.0 (LICENSE). Contributions are accepted under the Contributor License Agreement, which keeps provenance clean and preserves the option of offering the project under other terms.