惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V
Visual Studio Blog
MongoDB | Blog
MongoDB | Blog
Engineering at Meta
Engineering at Meta
云风的 BLOG
云风的 BLOG
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
T
The Exploit Database - CXSecurity.com
P
Privacy & Cybersecurity Law Blog
Know Your Adversary
Know Your Adversary
月光博客
月光博客
I
InfoQ
阮一峰的网络日志
阮一峰的网络日志
NISL@THU
NISL@THU
爱范儿
爱范儿
S
Securelist
博客园 - 叶小钗
C
CERT Recently Published Vulnerability Notes
Recorded Future
Recorded Future
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
aimingoo的专栏
aimingoo的专栏
D
DataBreaches.Net
G
GRAHAM CLULEY
P
Proofpoint News Feed
A
About on SuperTechFans
Google DeepMind News
Google DeepMind News
C
Cyber Attacks, Cyber Crime and Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Tor Project blog
Stack Overflow Blog
Stack Overflow Blog
T
Threat Research - Cisco Blogs
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
Hugging Face - Blog
Hugging Face - Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Recent Announcements
Recent Announcements
P
Proofpoint News Feed
The GitHub Blog
The GitHub Blog
The Cloudflare Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
Jina AI
Jina AI
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
罗磊的独立博客
博客园 - 【当耐特】
H
Help Net Security
F
Fortinet All Blogs
T
The Blog of Author Tim Ferriss

Show HN

CSP Radar GitHub - awebai/aweb-team-coord-worktrees: An aweb team template for a minimum team with a permanent coordinator and worktrees with local developers. GitHub - fujibee/agmsg GitHub - lucastononro/notify: 100% local, free, offline attention skill for Claude Code: plays a sound and speaks a short status update when a long task finishes, blocks, or needs a decision. GitHub - sebastianwessel/skills: AI Skills tivatdoar / workout-to-work · GitLab GitHub - enumura1/py-sql-cleaner: Find, format, and safely extract embedded SQL from Python files. GitHub - intent-bench/intent-bench: Intent fulfillment benchmark for agentic AI engineering GitHub - steveking-gh/firmion: Firmion is DSL and engine for firmware image generation. GitHub - villagesql/villagesql-skills: Agent skills for VillageSQL - gemini-cli-extension; claude-code-plugin GitHub - 0gsd/enough: a personal language system for planning, writing, and translation. GitHub - Kaelio/ktx: ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer GitHub - ThatXliner/xtras: Xliner's Claude Code Skills GitHub - flightdeckhq/flightdeck: Observability and control plane for AI agents. GitHub - search-router/simple-search: Open-source reference app on top of the Search Router API: FastAPI + Jinja metasearch service with pluggable backends, deterministic mocks (no API key needed), RTL UI, Redis cache, and a demo ads cabinet. CSP Radar GitHub - Light-Heart-Labs/DreamServer: Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. GitHub - Diplomat-ai/diplomat-agent-ts: What can your TypeScript AI agent do to the real world? Scan your code. See which tool calls have zero checks Code Block Selector - Visual Studio Marketplace Prometheus dependency graph — interactive showcase | Riftmap Show HN: I made a vi-like modal keyboard plugin for Figma GitHub - run-llama/liteparse: A fast, helpful, and open-source document parser GitHub - dalemyers/Roar: A macOS CLI tool for notifications GitHub - district-solutions/open-agent-tools-coder: Enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,900+ (2+ TB) popular github repos using large and small ai models to create reuseable: json, markdown and parquet files for local-first tool-calling models. GitHub - progapandist/stripeek: A local TUI proxy for real-time Stripe API debugging, built for navigating complex payloads fast. GitHub - sir1st/hermes-desktop: All-in-one cross-platform desktop app for Hermes Agent — bundles Python + hermes-agent + hermes-web-ui GitHub - astefanutti/shaderbang: Shebang for Shaders Show HN: Generate Claude Code Workflows using Spec Driven Development approach GitHub - nixys/nxs-universal-chart: The Helm chart you can use to install any of your applications into Kubernetes/OpenShift Show HN: AI agents for UK GDAD PCF roles and their skills The Two Pillars: Mixer Mode and Meta-Software in the Reorganization of Software Work After AI GitHub - JaiCode08/teleport-env What 1,000+ Harness Experiments Taught Me About Self-Improving Agents Show HN: Liiists, a Markdown-first, iOS and CLI list app SwiperTab – Get this Extension for 🦊 Firefox (en-US) GitHub - kouhxp/fftext: Summarize, explain, fact-check, or translate any text, URL, or file. No GPU. No cloud. One command GitHub - sweetpad-dev/sweetpad: Develop Swift/iOS projects using VSCode GitHub - dogmaticdev/IRON: IRON a.k.a. Intermediate Representation Object Notation is a Interpreter/Database that is used to create Programming Languages. GitHub - sjhalani7/vaen: Package your AI coding harness into a portable .agent file, and share it across repos, teams, & the community without ever having to copy-paste instructions, skills, MCP config, or secrets. Show HN: Gandalf the Grader Show HN: Citadeld – replay any CI failure locally from a single file GitHub - tdortman/cuSBF: High-Performance GPU Super Bloom Filter coral-ai/claude-code-token-xray at main · Coral-Bricks-AI/coral-ai GitHub - ulyssestenn/funes: Funes is a Git-based framework for LLM-managed knowledge work: an AI Librarian ingests raw sources, builds an interlinked Markdown knowledge base, and uses it to produce cited reports, analyses, and other outputs. GitHub - ThatXliner/gah: Git Add Hunk, built for agents to use GitHub - harmont-dev/harmont-cli: Command-line client for the Harmont CI platform GitHub - brooksmcmillin/mcp-authflow: OAuth 2.0 Authorization Server framework for MCP servers GitHub - javaid-codes/audit-supply-chain-agents GitHub - amorey/gochan: A small library of common channel architectures for Go, inspired by Rust GitHub - arifozgun/OpenGem: Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility in 1 file GitHub - Pranesh950/BioPetals: 🌸 Run BIOxAI models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading GitHub - cnguyen14/bounty-doctor: Diagnose a GitHub bounty issue before you waste hours: detects honeypot scam repos, AI-bot attempt swarms, and stale contests. Show HN: CoreMCP – MCP Server for On-Prem DBs Show HN: KittyHTML – Render HTML/CSS as an inline image in your terminal GitHub - bingud/filemat: Web-based file manager Show HN: TruthLens – Free multi-signal deepfake image detector GitHub - apexlocal-jz/claude-usage-tray: Windows system-tray app showing your Claude Code rate-limit usage at a glance. Zero deps, ~300 lines of PowerShell. Cross-IDE (works regardless of VS Code, Cursor, plain terminal). Release v0.1.2.1 · kouhxp/yapsnap GitHub - noopolis/moltnet: Self-hostable chat network for AI agents. Pre-built bridges for Claude Code, Codex, and the Claws. Rooms, DMs, history. No Slack bots, no Matrix, no glue code. GitHub - tamerh/enju: Coordinating Humans, AI Agents, and Compute as Peers on a Shared Workflow Graph Show HN: Continuity-auth – Respect-weighted rate limits for the open web GitHub - luml-ai/luml: AI lifecycle platform where engineers and agents track experiments, train models, and ship to production. GitHub - mrdanielcasper/CoreTex: A UNIX-inspired, biomimetic, flat-file AI harness and knowledge engine. GitHub - clemg/pierre-github: Pierre's diffs.com and trees.software for Github GitHub - lyriks-io/unspaghettit: Behavior-driven AI development without prompt spaghetti. GitHub - sofumel/claude-handoff-revive: Resume Claude Code work after rate/usage/context limits without replaying the prior transcript. Auto-saves at 90%/95% usage. Plugin-installable, 10 languages. GitHub - dotexorg/saferpc: Typed, end-to-end encrypted RPC over any bidirectional channel. GitHub - BeeZeeAgent/beezee: Agent harness orchestration Legato Next.js Boilerplate for Internal Tools · CoreUI GitHub - clark-labs-inc/clark-hash: Clark Hash, 32x smaller searchable sketches for embeddings GitHub - ZeroPointRepo/youtube-mcp: The fastest YouTube transcript + YouTube search MCP for AI agents. Try for free. Typing Mastery — climb toward 100+ WPM, deliberately GitHub - Andebugulin/Awareen GitHub - fayzan123/claude-workflow-composer: Visual desktop app for composing multi-agent coding workflows. Drag agents, attach skills and MCPs, wire handoffs, export to .claude/ GitHub - StackOneHQ/stack-nudge We hardened an LLM agent. Each defense we added made it more exploitable. GitHub - alkait/WhatsKept: Agent-queryable WhatsApp history from an iOS backup — a single Go binary. GitHub - octelium/cordium: Open-source, general-purpose sandbox platform for devs and AI agents that provides identity-based secure access to infrastructure without credentials. GitHub - scosman/videowright: Build animated explainer videos with your coding agent GitHub - dipankar/dscode: The code editor you can take apart. GitHub - zoharbabin/web-researcher-mcp: MCP server (Go) for AI assistants: web search, content extraction, academic/patent/news research. Multi-provider routing, 4-tier scraping, search lenses. Works with Claude, Cursor, and any MCP client. GitHub - scanaislop/aislop: Catch the slop AI coding agents leave in your code: narrative comments, swallowed exceptions, as-any casts, dead code, oversized functions. 50+ rules across 7 languages (TypeScript, JavaScript, Python, Go, Rust, Ruby, PHP). Sub-second, deterministic, no LLM at runtime. MIT-licensed. GitHub - kouhxp/cheap-im: CPU-only voice agent approximating Thinking Machines' Interaction Models demo GitHub - unprovable/OrchidMantis: Orchid Mantis — standalone framework for Zero-Knowledge Proofs of eXploit (ZKPoX). GitHub - TangibleResearch/Halgorithem: A Algo designed to detect AI Hallucitions GitHub - CarpseDeam/Aura-IDE: An AI coding harness that shaped itself - Planner/Worker agents, repo awareness, surgical edits, validation, recovery, and safe diff approvals. GitHub - chojs23/concord: A feature-rich TUI client for Discord GitHub - aerf-spec/aerf: Agent Evidence Receipt Format (AERF) — an open specification for tamper-evident, independently verifiable records of AI agent actions. GitHub - Jwrede/tokentoll: Catch LLM cost changes in code review. Infracost for LLM spend. GitHub - samchon/ttsc: A `typescript-go` toolchain for compiler-powered plugins and type-safe execution + 500x faster lint integrated into compiler GitHub - Higangssh/homebutler: 🏠 Manage your homelab from chat. Single binary, zero dependencies. GitHub - olalie/tapmap: See where your computer connects and what stands out on a live world map. GitHub - Diplomat-ai/diplomat-agent: What can your AI agent do to the real world? Scan your code. See which tool calls have zero checks GitHub - Bajusz15/beacon: Open-source agent for secure remote access, monitoring, and deploys across home-lab and self-hosted machines like Raspberry Pi, N100, or any Linux server. Open web based TTY or tunnel Home Assistant and other local services securely without opening ports. BigTech AI News - Chrome 应用商店 GitHub - vinhnx/VTCode: VT Code is an open-source coding agent with LLM-native code understanding and robust shell safety. Supports multiple LLM providers with automatic failover and efficient context management. GitHub - Lumen-Labs/brainapi2: BrainAPI is a knowledge graph–powered AI memory layer that transforms unstructured data into structured knowledge, enabling intelligent search, recommendations, and contextual memory for AI agents and applications. GitHub - familiar-software/familiar: Let AI watch you work. Familiar lets your AI update its memory, skills, and knowledge by watching your screen. make sidebar/address bar rounded corner toggleable
GitHub - anthony-chaudhary/dos-kernel: Catch your AI agents when they lie about what they shipped — verifies claims against git instead of believing the agent.
anthonysarki · 2026-06-19 · via Show HN

DOS — the Dispatch Operating System

Catch your AI agents when they lie about what they shipped.

PyPI Python versions CI verified by DOS commit-claims License: MIT

📊 See it run on real repos: the scoreboard scores 15 popular AI-built repos (roborev, open-interpreter, crewAI, autogen, …) — how much agents wrote, which ones, and whether each commit's claim is backed by its own diff. Score yours: dos commit-audit --sweep --workspace . BASE..HEAD.

A terminal recording of the caught lie. The agent reports: Done! Shipped the login endpoint (AUTH1) and the password reset (AUTH2). git log shows one commit — AUTH1: ship the login endpoint. dos verify AUTH AUTH1 answers SHIPPED (exit 0); dos verify AUTH AUTH2 answers NOT_SHIPPED via none (exit 1) — caught. The exit code is the verdict: gate the agent's done on it and a false claim cannot land.
The whole pitch in one recording: the agent claims two features shipped; git backs one. dos verify answers from the commits, the lie exits 1, and a gate on that exit code refuses the false "done". Every line is the real CLI's verbatim output — scripts/build_caught_lie_cast.py re-records it whenever the output changes.

Two agent fleets side by side. Left, no referee: agents all report 'done!', every report is believed, and silent corruption (lies, collisions, spin) piles up into a codebase that 'sorta works' and can't be changed. Right, DOS adjudicates: dos verify reads git and the run branches to SHIPPED (exit 0, land it) or NOT_SHIPPED (exit 1, re-dispatch — caught), and that verdict steers the next step.
Run a fleet of agents on one repo. The left loop just feels like progress; the right one you can steer. The only difference is a verdict DOS reads from the real world — here, git — never the agent's word.

An AI agent will tell you it finished. DOS checks the real world instead of taking its word — and the nearest piece of the real world is your git history. An agent says it shipped the login endpoint; did it? Run one command, dos verify, and it answers from the artifacts the work left behind, not from what the agent typed: a commit backs the claim → SHIPPED, exit 0; nothing landed → NOT_SHIPPED, exit 1. The agent's story never enters into it. (Git is just the first witness DOS reads; the file tree, the clock, a CI status, a test environment's own state are others — anything the agent didn't author.)

dos verify AUTH AUTH1   # → SHIPPED      AUTH AUTH1 e62f74d   (exit 0)
dos verify AUTH AUTH2   # → NOT_SHIPPED  AUTH AUTH2           (exit 1)

That's the smallest version. It scales up, too: point a dozen agents at one repo — in CI, in a fleet, racing on the same files — and DOS also tells you which ones are stepping on each other, which one is spinning in circles, and which claim of "done" is real. Every answer comes from the artifacts (git, the file tree, the clock), never the narration. It works on a plain git repo with zero config and gets smarter the more you tell it, and the only thing you ever install is one small Python package.

Just add it — two commands, zero decisions. From the repo where your agent works:

pip install dos-kernel
dos init --hooks auto   # finds the agent runtime(s) you already use, wires in the checks

From then on: your agent can't tell you "done" unless the work actually landed, two agents can't silently overwrite each other's files, and a run that stalls gets flagged instead of quietly spinning. Nothing about your workflow changes, and you don't need to learn any of the vocabulary below to be covered. It prints the one config file it wrote; deleting the dos hook entries there undoes it. (No runtime detected? It says so and lists the names to pick from — it never guesses.)

v0.28.0 · 5,600+ tests · CI: Python 3.11–3.13 on Linux + a Windows 3.13 smoke run · the only runtime dependency is PyYAML · MIT.

🧭 Where to go next: the why & evidence (plain-words story, the 20-lines-of-bash answer, what's proven), wire it into your stack (MCP · hooks · install), the syscall + CLI reference, or, reading this as an AI agent?, AGENTS.md — build/test/check in three lines. The full map is the router just below.

🔤 Five words the rest of this page leans on. A plan is a named goal (AUTH); a phase is one shippable step of it (AUTH1); a lane is the slice of the file tree one agent may touch; the oracle is the part of DOS that reads the evidence and rules; a stamp is the mark a shipped phase leaves in a commit subject (AUTH1: …) — the thing the oracle greps for. That's the whole vocabulary.

In plain words

A coding agent does work, then tells you how it went. Usually the story is true; sometimes it's the cheerful "all work completed!" from a worker that shipped nothing. With one agent you catch that yourself by re-reading its output — a real tax you already pay. Run twenty at once and that tax stops being payable: nobody reads everything, each worker grades its own homework, and the unchecked problems pile up quietly until the codebase sorta works and nobody can safely change it. DOS is the referee that never reads the story — it reads what happened (the commit, the file, the clock) and hands you a verdict no narration can move. It costs about an afternoon, has one runtime dependency, and stays in its lane: it tells you what happened, never whether the code is good — quality stays with your tests and reviews. (The full plain-words version.)

Measured, not asserted

Every number here is scored against a fact the agent can't fake (a test environment's DB state, git history). A DOS gate caught 15 "I shipped it" lies in 258 tasks across two models with zero false alarms; the same referee stopped 6 of 8 silent collisions on one shared record; quitting doomed runs at the right moment saved ~11% of fleet compute with 0 of 1,634 winners wrongly killed; and the reward-set admission label lifted acceptance precision 60% → 100% by purging poison a self-graded collector keeps. The methodology, the two money-moment figures, and the projected-vs-bet honesty gradient are in what's proven and what's still a bet.

Where the rest of the docs are

This page keeps the hook, the demo, and the failure it fixes. Everything deeper lives on a focused page — find the question you arrived with and jump:

You're asking… Go to
"What is this in plain words, and why should my team care? Is it real?" Why a referee — the plain-words story, the 20-lines-of-bash / Temporal answers, and the full proven/bet evidence
"Show me it working, fast." Try it in 60 seconds, just below — one command
"I already run agents — how do I wire the verdict into my stack?" Wire it in — MCP, runtime hooks, the exit-code tier, fleet frameworks, and the install matrix
"What's the full command / syscall surface?" The syscall ABI & CLI reference — every verb, the three live screens, the verdict journal
"I run a fleet every day — how do I watch it, triage it, debug it?" Operating a fleet + Debug a stuck fleet
"How do I bend it to my org without forking it?" Extending it — the seven axes, the docs index, the playbooks
"What is actually proven, and can I re-run it?" For researchers — claims → invariants → reproduction
"I'm an AI agent orienting in this repo." AGENTS.md — what DOS is in three lines, build/test/check, the ~5 files worth reading
"What surfaces are stable and what's the deprecation window?" docs/STABILITY.md — the compatibility promise, what the version number means, and what will never break

Try it in 60 seconds

Got a terminal? This runs the whole thing in a throwaway repo — one command scaffolds it, makes a real commit, verifies it, and cleans up after itself:

pip install dos-kernel      # PyYAML is the only runtime dep
dos quickstart              # → SHIPPED AUTH AUTH1 … then NOT_SHIPPED AUTH AUTH2

One SHIPPED, one NOT_SHIPPED: the first is a claim git can back, the second is a claim nothing landed for. That contrast is the product. The demo closes with a router to wherever you already run agents — a Claude Code / Cursor tab (dos init --hooks), an MCP host, a CI step, or a fleet — so your next move is one line, not a docs dig. (Add --keep ./demo to keep the repo and poke at it. Don't even want the install? uvx --from dos-kernel dos quickstart runs the same demo ephemerally — nothing left behind.) The same thing by hand, in five lines, is docs/QUICKSTART.md.

The dos verify money-moment. Two equally-confident agent claims, checked against git. Left, what the agent claims (forgeable): 'Shipped AUTH1 — the login endpoint is done' and 'AUTH2 is done too — all work completed!'. Right, what git actually records: one real commit e389e8b 'AUTH1: ship the login endpoint', and no commit anywhere mentions AUTH2. The two verdicts: dos verify AUTH AUTH1 finds the token in a real commit subject → SHIPPED, exit 0, via grep-subject; dos verify AUTH AUTH2 finds it nowhere → NOT_SHIPPED, exit 1, via none. The confident AUTH2 claim collapses the instant no commit backs it.
Two equally confident claims, one verdict each — SHIPPED for the one git can back, NOT_SHIPPED for the one nothing landed for. Every string is verbatim output of examples/demo/verify_demo.sh. Step through it locally for the click-through version (it's an HTML file — clone the repo and open it in a browser; GitHub shows its source, not the running page).

The smallest real win: in a CI step or dispatch loop, replace the line that trusts an agent's "done" with dos verify PLAN PHASE and branch on its exit code (0 shipped / 1 not). No parsing, no plan, no config — the CI integration cookbook walks it end-to-end. To run it on a repo shaped like yours, start with Onboard a repo in 10 minutes.

Point the same witness at a review queue when commits pile up faster than anyone can read them. Residual review folds commit-audit's per-commit verdict into three bands — CLEARED (the diff witnessed the claim, so spend ~0 attention re-asking "did it do what it said"), RESIDUAL (a claim git couldn't back — the human's 100%), and the no-claim rest. On this repo's own last 200 commits it cleared 170 of 171 checkable claims: that's the re-review you skip, proven by git rather than a model's confidence score. (CLEARED means the change's shape matched its claim — not that the code is correct; correctness review still applies to every commit. The band can only ever ask for more eyes, never fewer.)

Next level up — wire the verdict into your own stack: Wire it in.

What goes wrong in a fleet

Run a pile of agents at once with nobody refereeing, and here's how it goes: each worker reports its own success, and you believe the reports, because what else is there to go on? The unchecked problems pile up quietly — a lie here, two agents clobbering the same file there, a little scope creep, one worker spinning in circles — until the codebase sorta works and nobody can safely change it.

The trouble is you launched the agents and then let them grade their own homework. DOS gives you the missing signal — a verdict from ground truth — so the loop closes. Here is the same fleet under both regimes:

The two regimes as a flowchart — NO REFEREE: you believe the narration; DOS ADJUDICATES: you steer on a verdict
flowchart LR
  subgraph OPEN["NO REFEREE — you believe the narration"]
    direction TB
    A1["agent: 'done!'"] --> B1[["believed"]]
    A2["agent: 'done!'"] --> B1
    A3["agent: 'done!'"] --> B1
    B1 --> C1["silent corruption piles up<br/>(lies · collisions · spin)"]
    C1 --> D1["'sorta works' — can't be changed"]
  end
  subgraph CLOSED["DOS ADJUDICATES — you steer on a verdict"]
    direction TB
    A4["agent: 'done!'"] --> V{{"dos verify<br/>reads git"}}
    V -->|in git ancestry| S["SHIPPED (exit 0)"]
    V -->|found nowhere| N["NOT_SHIPPED (exit 1)"]
    S --> L["land it"]
    N --> R["re-dispatch / flag — caught"]
    R -.verdict steers the loop.-> A4
  end
Loading

Here are the failures a fleet actually produces, each next to the ground truth that quietly contradicts the worker's story — and the verdict DOS hands back:

A worker… …but the ground truth is DOS verdict
says it shipped a unit of work no commit ever landed verifycaught lie
tried, but the commit silently failed no commit ever landed verify (the flake — indistinguishable from a lie without git)
edits files another worker owns two agents, one shared file arbitraterefuse the second
overruns the file region it claimed footprint reaches beyond the declared tree scope-gateREFUSE (before the write lands)
reports "making progress" 0 commits, only a fresh heartbeat livenessSPINNING

The first row is the most common one. The classic tell is a cheerful one-liner, "all work completed!", from a worker that did little or nothing. DOS never reads that line; it reads the ground truth, so the claim collapses the instant no artifact backs it (more in docs/108). That's also what makes it cheap to adopt: verify needs no plan, no registry, no config, and the exit code is the verdict — any shell or CI step can branch on it without parsing a word.

Prefer to watch it move? The two loops are also a self-contained animation you step through one frame at a time — clone the repo and open docs/assets/loop_visual.html in a browser. (It's an HTML file, so GitHub shows its source rather than running it — open it locally.)

Lease scope — single filesystem today. The verification half (verify, commit-audit, liveness) travels across machines freely because it reads git history. The admission half (arbitrate, lane leases) is local-filesystem only: the WAL lives on one disk, and workers on separate machines share no serialization point. A fleet that runs all its workers on one machine or in one shared filesystem is fully covered; a fleet spanning multiple hosts should treat dos arbitrate as advisory (not a hard mutex) until a remote-lease driver ships. See docs/366 for the design.

How far you take it

It works on a plain git init with zero config, and gets smarter the more you tell it. You don't adopt a framework and pick a tier; you start at the shallow end and it keeps paying off as you wade deeper — the same kernel the whole way:

  • Zero config. Point dos verify PLAN PHASE at a plain git repo — no plan, no registry, no dos.toml. It answers from commit history alone (via grep-subject / via none). This is the whole of QUICKSTART and the day-one CI win above.
  • Tell it your structure. dos init writes a dos.toml (lanes, paths, ship grammar as data); add a plan doc and dos plan lays each phase's claim beside the oracle's verdict. Here's exactly what a plan file looks like (copyable, round-trips with the built-in reader), and four worked example workspaces.
  • Teach it your own types. Declare your own block reasons, gate verdicts, output renderers, admission predicates, a model-backed judge, a custom plan dialect, or a whole host driver — all as workspace policy, never a fork. The map is docs/HACKING.md (seven extension axes) + the copy-me examples/dos_ext/.

How you plug it in

That slope is how deep your config goes. The other axis is how you call the referee at all — and you adopt through whichever surface matches how you already work, not by restructuring your stack. The same kernel verdicts are reachable through every row here, lowest-friction first:

Surface Adopt it when… The move
MCP server you drive an agent through an MCP host (Claude Desktop, Cursor, Cline, an Agent-SDK app) add one line to the host config ({ "command": "dos-mcp" }) and ask the agent to dos_verify its own last claim — zero code. The advisory path (the agent asks). See Give your agent a lie detector.
Runtime hooks you run an agent loop (Claude Code, Cursor, Codex CLI, Gemini CLI) and want the verdict to act, not just be available dos init --hooks <runtime> wires the verdict into that host's own hook config — a refused call is denied before it runs, a false "done" is refused. The enforcement path (the host denies). One command, no hand-edited YAML. See QUICKSTART + docs/221.
CLI exit-code you have any command-running environment — a CI step, a pre-push hook, or an agentic CLI like aider whose lint/test-cmd trusts a "done" branch on a dos verb's exit code (dos verify: 0 shipped / 1 not; dos commit-audit: 0 clean / 1 over-claim) — the verdict is the exit code, no hook adapter and no MCP client. The honest tier for hook-less hosts (Windsurf, Warp, Zed). The exit-code tier cookbook.
Python API your dispatcher/orchestrator is already Python import dos and call the pure syscalls (dos.oracle.is_shipped, dos.arbiter.arbitrate, …) — state-in / verdict-out, no subprocess. The Python cookbook.
Fleet framework your fleet already runs on LangGraph, CrewAI, AutoGen, or the OpenAI/Claude Agents SDK bolt the referee onto the framework's own seam — a referee node, a termination condition only git can satisfy, an output guardrail with a git tripwire. One function, no rewrite; every seam executed against the real framework. The fleet-framework cookbook.
Swarm runtime your agents run on Hermes, OpenClaw, or a SwarmClaw-style autonomous swarm — privileged tools, shared memory docs / task boards, and no lock manager for either drop a two-function adapter into the tool-execution loop: guard_action refuses an arbitrary-exec command before it runs, and acquire_lease / release_lease bracket each shared-state write so the lost update never lands. No import dos — it shells the CLI; Hermes' pre_tool_call hook also speaks DOS natively (dos hook pretool --dialect hermes). The runnable, A/B-measured Hermes / OpenClaw worked example + docs/278.
Skill pack you run agents in Claude Code and want the workflow, not just the verdict dos init --skills drops editable SKILL.md screenplays that wire the syscalls into a snapshot → audit → gate → take-a-lane loop. See QUICKSTART §2.
Driver your lanes must be computed, or you add a provider-backed judge write one dos/drivers/<host>.py (a LaneTaxonomy + a config factory), loaded by name, never imported by the kernel. The map is HACKING.md.

The two axes are independent: a zero-config repo can adopt through any surface, and a deeply-configured one still answers over the same CLI and MCP tools. Start at the top row — it's the one that costs nothing to try. The first two rows also compose: MCP advises (the agent checks its own work), hooks enforce (the host stops a bad action) — wire both for the full loop.

Those surfaces are the upstream half of the value chain — who calls the referee. The same verdicts also flow downstream, to the systems that act on them: every adjudication lands in a verdict journal that dos export drains to your observability stack (Datadog / Honeycomb / Grafana — docs/266), dos notify pushes what-needs-a-human to Slack, dos reward gates what a fine-tune may train on, and dos attest mints a signed receipt a skeptic can check without loop access (docs/246). One kernel, one verdict vocabulary, from the agent's tool call to your dashboard.

Next level up — run it every day: Operating a fleet.

Citation

The ideas here are written up in a paper — "Verification Is All You Need — But Not Where You Think" — on the out-of-loop referee for agent fleets. A built PDF lives at paper/releases/; the arXiv preprint is in preparation. Until the arXiv ID lands, cite the repository:

@misc{dos_kernel,
  title        = {Verification Is All You Need --- But Not Where You Think},
  author       = {Chaudhary, Anthony},
  howpublished = {\url{https://github.com/anthony-chaudhary/dos-kernel}},
  note         = {DOS --- the Dispatch Operating System; arXiv preprint in preparation},
  year         = {2026}
}

License

MIT — see LICENSE.