惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

Hacker News: Show HN

Show HN: Write your BPF programs in Go, not C GitHub - Userfrom1995/benchd: BenchD is a browser-based CPU benchmark that runs fully on the client. Show HN: Free One-shot cloud agents with OpenCode and Daytona and Cloudflare Parseflow Segment Tree — Algorhythm Show HN: Modernizing my old PhD work in an evening with little Qwen3.6 MoE Show HN: GitVitae – Free hosted portfolio and resume for anyone GitHub - wavever/buildby: Detect whether desktop apps are built with Electron, Flutter, Tauri, Qt, .NET, JVM, CEF, or native code. boku — YAML task runner Show HN: Darc – grep-like memory search tool for coding agents Mixpanel Headless - Mixpanel Docs Show HN: A demo video of Effected Keyboard 2 Introducing Open Public awesome-skills/gtm-mavericks at main · conductor-oss/awesome-skills Show HN: ATM, a tiny terminal task manager for local coding agents Freenet Workspace Show HN: AI Manager Show HN: SubTrack – Find forgotten subscriptions via bank transaction scanning Show HN: We dropped Go for Rust in our real-time telephony AI media plane Show HN: I Dedicated 4 Years to Mastering Offline Password Cracking Home — Noada Show HN: I Made a Claude Skill for SDD Show HN: Twixt – transform one word into another in four moves Show HN: Daily word puzzle game based on polysemy GoKubeDownscaler: Reduce Kubernetes Costs Off-Hours GitHub - openclaw-easy/ViralMint: Open-source viral content pipeline — scout trends, analyze competitors, generate AI videos, auto-publish. AGPL-3.0. GitHub - baidu-baige/LoongForge: A modular, scalable, high-performance training framework for LLMs, VLMs, diffusion, and embodied models. Show HN: (Better) Chrome Tab Manager Show HN: SoMatic – Vision-based OS automation framework for AI agents Physics AI – Free Physics Solver Online (Step-by-Step) SUPPLYCHAIN.FAIL — Open Source Vulnerability Timeline PocketWebTools GitHub - mirshko/boatswain: A macOS menu bar app for Fathom Analytics. Keep an eye on your site stats without ever leaving your keyboard. What does your investment actually buy? — Post-Money SAFE Calculator GitHub - vipulawl/claude-tips: Customize Claude Code spinner tips with live jokes, quotes, facts, or your own content GitHub - changespec/spec: ChangeSpec: open specification for software change communication Show HN: I built a private, manual 0% balance transfer tracker 3.125-Bit LLM quantization bypassing tensor cores Medical curiosities | Thomas Morris FlutterTime ~ Timezone Planner Steam 上的 Junebug GitHub - Helvesec/rmux: Universal Rust multiplexer with a typed SDK — drive any CLI or TUI app from code. Native on Linux, macOS, and Windows. GitHub - manas15/try-on: LiveLook — Real-time virtual try-on with gesture control, powered by Decart's Lucy VTON model GitHub - vitalysim/the-knowledge-guy: Turn any PDF or EPUB into a structured Claude Code skill - then ask your whole bookshelf a single question. Gemini Omni Flash AI Video Generator | Free Online GitHub - elliotgao2/handsets: A high-performance Android control CLI, built for agents and humans GitHub - enzoferraripapa-arch/ai-vprocess-ops: Engineering memory for AI coding agents: requirements, decisions, evidence, traceability, and V-process/ALM handoff Show HN: Dokkaebi – Run your WASM backend directly on the client side Send messages beyond your lifetime SkinMax App | Your Personal Skin Care Coach GitHub - kmdupr33/fks2g: A CLI for generating LLM-backed metrics for deciding how closely to review code ISS QuietGPT - Make ChatGPT Reply Smaller GitHub - Quintisimo/macfigure: Mac configuration in pkl. Simple alternative to nix-darwin Show HN: SafeRun – Replay debugging and inline prevention for AI agents 3 GitHub - sathvikc/agent-chat-bridge: Turn any AI agent chat session into an async agent. Register a timer, shell command, or webhook — the bridge automatically resumes the session with your prompt when the trigger fires. SnapAPI - Website Screenshot & Data Extraction API Introducing @cipherstash/stack Show HN: E2E Encrypted Terminal Screen Share Windows 98½ Show HN: SafeRun – Replay debugging and inline prevention for AI agents 2 Show HN: My custom Statusline for Claude Code (Python wrapper around claudeline) GitHub - kageroumado/phosphene: A video wallpaper engine for macOS Tahoe Best Remote Jobs — Work From Home | RemoteJobs.place udoc Free AI Rewriter - Revise GitHub - arashThr/hugo-flow: Simple rich-text CMS for Hugo weblogs. Try at https://hugo.arashtaher.com GitHub - light-cloud-com/ice: Free, open-source, visual studio for cloud infrastructure for macOS, Windows & Linux. GitHub - kouhxp/yapsnap: Snap any video URL or audio file into plaintext. No GPU. No cloud. One command. What if we made SIMA2 from Temu iPhone 版“Today” - App Store Runo - Web Scraping API | Any URL to Typed JSON Show HN: AI Editor for Websites GitHub - AdamGonda/ward: Run [ npm i ] safely, audit installs inside a docker container. The Crucible — 8 voices, one verdict Screenshot 2026 05 20 at 4 03 10 PM — Postimages Show HN: Chess Puzzles, but for Developers Show HN: I built Istanbul live transit map Show HN: Agent.email – sign up via curl, claim with a human OTP GitHub - mfairley/expo-callkit-telecom: 📞 CallKit + Core-Telecom for React Native + Expo. A modern react-native-callkeep alternative. I tried 4 LLM speedup techniques on CPU. Three made it slower. Show HN: I made a tool for learning scales, chords, and how to combine them Learn how to build AI products through practice 1 BTC = 17.17 troy oz of gold · Bitcoin Weigh-In p-Hacker — top trending Client Challenge hty GitHub - Artain-AI/ignite-ms: Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control. GitHub - mupt-ai/dari-docs: optimize your documentation through fleets of agents GitHub - dcostenco/prism-coder: The Mind Palace for AI Agents - HIPAA-hardened Cognitive Architecture with on-device LLM (prism-coder:7b), Hebbian learning, ACT-R spreading activation, adversarial evaluation, persistent memory, multi-agent Hivemind and visual dashboard. Zero API keys required. Catio | The Architecture IDE for Modern Software Systems SysWP Radar — Veja TUDO que toca seu site homecrew — package manager for agent skills GitHub - platform-engineering-labs/formae: Infrastructure-as-Code Platform Built for the Future VibeKeys Max - Ready to Ship Show HN: We wrote forensic intelligence reports on 20 open-source codebases GitHub - bytedance/Lance: A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing. GitHub - zero-intelligence/zero-protocol: ZERO.md — A universal protocol for personal AI context. Your AI knows your project. It doesn't know you. This file fixes that. Show HN: Chatroom with curl command (requires IPv6)
GitHub - LeoStehlik/proof-loop: Repo-local verification protocol for AI coding agents: acceptance criteria, separate verifier roles, proof artifacts, and evidence-backed done claims.
LeoStehlik · 2026-05-22 · via Hacker News: Show HN

Tests

Make AI coding agents prove when work is done.

Proof Loop is a repo-local verification protocol for AI coding agents. It freezes acceptance criteria before the build, separates builder and verifier roles, records durable proof artifacts in the repo, and refuses to call work done until every acceptance criterion has a fresh PASS verdict.

Use it when an agent, team, or multi-agent sprint needs a clear boundary between “looks done” and verified work. Because the protocol is just files plus role discipline, it works with OpenClaw, Hermes, Codex, OpenCode, Claude Code, or any other harness that can read and write a repository.

Use Cases

  • keep AI coding agents honest when they claim a task is done
  • freeze acceptance criteria before implementation starts
  • separate builder and verifier roles in multi-agent coding work
  • leave proof artifacts in the repo for future review

Animated terminal demo: Proof Loop doctor, check, and report commands

Proof artifacts and role-brief examples are indexed in examples/README.md.

20-second demo

git clone https://github.com/LeoStehlik/proof-loop.git
cd proof-loop
make test

tmp=$(mktemp -d)
bin/proof-loop-init hn-demo --title "Prove this task before done" --root "$tmp"
bin/proof-loop-check "$tmp/.agent/tasks/hn-demo"

The last command fails on purpose because the generated task has not been verified yet. Proof Loop only returns success after a fresh verifier records PASS for every acceptance criterion and problems.md is empty.

A completed passing example is included:

bin/proof-loop-check examples/example-task/.agent/tasks/ui-language-fix
bin/proof-loop doctor
bin/proof-loop report examples/demo-repo/.agent/tasks/nav-labels-proof --format md

Why It Exists

AI coding agents often fail in predictable ways:

  • they claim completion without durable proof
  • the same session builds and judges its own work
  • acceptance criteria drift while implementation is underway
  • verification is a prose summary instead of a live check
  • future sessions cannot tell what was actually tested

Proof Loop makes completion auditable. A task is done only when a fresh verifier has checked each AC and the repo contains the artifacts to prove it.

What You Get

  • a clear sprint protocol: spec freeze -> build -> evidence -> fresh verify -> fix loop
  • role boundaries for orchestrator, spec-freezer, builder, verifier, and fixer
  • helper scripts to initialize and check task proof folders
  • a complete example task with passing artifacts
  • copy-paste role briefs for OpenClaw, Hermes, Codex, OpenCode, Claude Code, or any agent setup
  • a documented boundary with Loopsmith for recurring behaviour improvement

CLI

bin/proof-loop init TASK_ID --title "Task title"
bin/proof-loop check TASK_ID
bin/proof-loop status TASK_ID
bin/proof-loop list
bin/proof-loop doctor
bin/proof-loop report TASK_ID --format md
bin/proof-loop install-guides --dry-run --harness codex --harness claude

Quick Start

Clone the repo or copy it into the project where you want to run the protocol.

Create a task proof folder from this repo or from another repository:

bin/proof-loop-init ui-language-fix --title "Fix German navigation labels" --root .

This creates:

.agent/tasks/ui-language-fix/
  spec.md
  verdict.json
  problems.md
  evidence.md

Fill spec.md with explicit acceptance criteria before implementation starts.

After the build and verifier pass, check whether the task is allowed to be called done:

bin/proof-loop-check .agent/tasks/ui-language-fix

The check exits non-zero unless:

  • verdict.json has overall: PASS
  • every AC has status: PASS
  • problems.md is empty or absent

What This Is Not

  • not an agent framework
  • not a benchmark suite
  • not a replacement for tests
  • not tied to one model, vendor, or harness

Proof Loop is deliberately small: a protocol, a few files, and a mechanical done gate.

The Protocol

spec freeze -> build -> evidence -> fresh verify -> fix -> fresh verify
                                         ^                    |
                                         |____________________|
                                      repeat until all ACs PASS

Roles

Role Does Never
Orchestrator Keeps the loop intact and refuses weak completion Accepts narrative-only proof
Spec-Freezer Writes frozen spec.md with explicit ACs Edits production code
Builder Implements against the frozen spec Verifies own work as final
Verifier Fresh session that checks each AC Edits production code
Fixer Applies minimal fixes for verifier findings Signs off on completion

The verifier must be a fresh session. The agent that built the change does not judge whether the change is done.

Acceptance Criteria

Good ACs are specific and testable by a third party.

AC1: A user with locale=de sees all navigation labels in German after saving language preference.
     Verify: browser check against a German-locale test user.

AC2: The language preference survives page reload.
     Verify: reload the page and confirm the saved locale and labels remain German.

AC3: Existing English navigation remains unchanged for locale=en.
     Verify: switch back to English and confirm the original labels render.

Weak ACs are task descriptions, not proof conditions:

AC1: Translate the UI.
AC2: Make language switching work.
AC3: Fix the bugs.

Artifacts

Every task stores proof under .agent/tasks/<TASK_ID>/.

.agent/tasks/<TASK_ID>/
  spec.md       frozen ACs, constraints, non-goals, verification approach
  evidence.md   build summary and checks run
  verdict.json  structured verifier result: PASS / FAIL / UNKNOWN per AC
  problems.md   specific open failures, empty when no problems remain

See references/artifacts.md for schemas.

Real Demo

Run a small failing-to-passing demo:

make demo

The demo intentionally breaks a tiny navigation-label fixture, shows the check failing, applies the fix, reruns the check, and renders a proof report.

Examples

A complete passing example lives at:

examples/example-task/.agent/tasks/ui-language-fix/

Role prompts live at:

examples/role-briefs/
  orchestrator.md
  spec-freezer.md
  builder.md
  verifier.md
  fixer.md

Proof Loop vs Loopsmith

Proof Loop governs a single task.

Loopsmith improves repeated agent behaviour over time.

Use Proof Loop when you need a specific task to finish with evidence. Use Loopsmith when the same failure pattern keeps coming back and you want to improve the agent, prompt, policy, or evaluator itself.

See references/loopsmith-bridge.md.

When To Use Which Repo

Use this repo when a specific coding task needs evidence before anyone is allowed to call it done. Proof Loop freezes the spec, separates builder and verifier roles, requires proof artifacts, and records verdicts in the repo.

Use the neighbouring tools at different points in the workflow:

Need Use
Turn a fuzzy request into an executable agent brief Brief Master
Prove one coding task is actually done Proof Loop
Improve repeated agent behaviour with evals Loopsmith
Keep source-backed memory for long-running agents Sovereign Brain
Stop frontend agents producing generic UI sludge no-slop-ui

A practical chain looks like this: messy request -> Brief Master brief -> Proof Loop task -> Loopsmith eval if the same failure keeps recurring -> Sovereign Brain records the durable decision.

Related Tools

  • Loopsmith - use when Proof Loop exposes a repeated agent behaviour problem that should become an eval and promotion loop.
  • Sovereign Brain - source-backed memory for long-running agents; useful when proof artifacts, decisions, and synthesis need durable context.
  • Brief Master - helps write sharper task briefs and acceptance criteria before a Proof Loop starts.

Installation As A Skill

OpenClaw

Add your skills directory to openclaw.json:

{
  "skills": {
    "load": {
      "extraDirs": ["/path/to/your/skills"]
    }
  }
}

Clone this repo into that directory:

git clone https://github.com/LeoStehlik/proof-loop.git /path/to/your/skills/proof-loop

Codex / Claude Code

Copy the proof-loop folder into your agent skills directory, or reference SKILL.md directly in your task brief. For harnesses without a formal skill system, use the README, role briefs, and scripts directly from the repo.

Repository Map

proof-loop/
  SKILL.md                         skill trigger and core operating rules
  bin/
    proof-loop                     unified CLI
    proof-loop-init                compatibility wrapper
    proof-loop-check               compatibility wrapper
  scripts/
    init_task.py                   create .agent/tasks/<TASK_ID>/ skeletons
    check_task.py                  mechanical done gate
  schemas/                         JSON schemas for verdict and evidence bundles
  templates/                       opt-in harness guide templates
  tests/                           stdlib unittest coverage for CLI behavior
  .github/workflows/test.yml       CI running make test
  references/
    workflow.md                    full phase-by-phase protocol
    brief-template.md              reusable sprint and role prompts
    artifacts.md                   artifact schemas
    loopsmith-bridge.md            when to escalate repeated failures to Loopsmith
  examples/
    example-task/                  complete passing proof artifact example
    role-briefs/                   copy-paste role prompts

Status

Usable protocol skill and small toolkit. The scripts are intentionally stdlib-only so they can run inside almost any repository without packaging ceremony.

License

MIT - see LICENSE.

Attribution

Inspired by repo-task-proof-loop, adapted for practical multi-agent coding work and public agent-operation skills.