惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Schneier on Security
V
Visual Studio Blog
M
MIT News - Artificial intelligence
云风的 BLOG
云风的 BLOG
Y
Y Combinator Blog
N
Netflix TechBlog - Medium
Recent Announcements
Recent Announcements
U
Unit 42
D
Docker
Recorded Future
Recorded Future
GbyAI
GbyAI
C
Check Point Blog
博客园 - 叶小钗
大猫的无限游戏
大猫的无限游戏
博客园 - 司徒正美
月光博客
月光博客
A
About on SuperTechFans
Last Week in AI
Last Week in AI
T
Tailwind CSS Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
F
Fortinet All Blogs
宝玉的分享
宝玉的分享
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Jina AI
Jina AI
G
GRAHAM CLULEY
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
I
InfoQ
AWS News Blog
AWS News Blog
The Hacker News
The Hacker News
Cyberwarzone
Cyberwarzone
博客园 - 三生石上(FineUI控件)
P
Palo Alto Networks Blog
C
CERT Recently Published Vulnerability Notes
aimingoo的专栏
aimingoo的专栏
S
Securelist
F
Full Disclosure
T
The Exploit Database - CXSecurity.com
Cisco Talos Blog
Cisco Talos Blog
Know Your Adversary
Know Your Adversary
T
Tor Project blog
Scott Helme
Scott Helme
T
Threat Research - Cisco Blogs
NISL@THU
NISL@THU
A
Arctic Wolf
美团技术团队
G
Google Developers Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
L
LangChain Blog
Simon Willison's Weblog
Simon Willison's Weblog
Apple Machine Learning Research
Apple Machine Learning Research

Show HN

CSP Radar GitHub - awebai/aweb-team-coord-worktrees: An aweb team template for a minimum team with a permanent coordinator and worktrees with local developers. GitHub - fujibee/agmsg GitHub - lucastononro/notify: 100% local, free, offline attention skill for Claude Code: plays a sound and speaks a short status update when a long task finishes, blocks, or needs a decision. GitHub - sebastianwessel/skills: AI Skills tivatdoar / workout-to-work · GitLab GitHub - enumura1/py-sql-cleaner: Find, format, and safely extract embedded SQL from Python files. GitHub - intent-bench/intent-bench: Intent fulfillment benchmark for agentic AI engineering GitHub - steveking-gh/firmion: Firmion is DSL and engine for firmware image generation. GitHub - villagesql/villagesql-skills: Agent skills for VillageSQL - gemini-cli-extension; claude-code-plugin GitHub - 0gsd/enough: a personal language system for planning, writing, and translation. GitHub - Kaelio/ktx: ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer GitHub - ThatXliner/xtras: Xliner's Claude Code Skills GitHub - flightdeckhq/flightdeck: Observability and control plane for AI agents. GitHub - search-router/simple-search: Open-source reference app on top of the Search Router API: FastAPI + Jinja metasearch service with pluggable backends, deterministic mocks (no API key needed), RTL UI, Redis cache, and a demo ads cabinet. CSP Radar GitHub - Light-Heart-Labs/DreamServer: Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. GitHub - Diplomat-ai/diplomat-agent-ts: What can your TypeScript AI agent do to the real world? Scan your code. See which tool calls have zero checks Code Block Selector - Visual Studio Marketplace Prometheus dependency graph — interactive showcase | Riftmap Show HN: I made a vi-like modal keyboard plugin for Figma GitHub - run-llama/liteparse: A fast, helpful, and open-source document parser GitHub - dalemyers/Roar: A macOS CLI tool for notifications GitHub - district-solutions/open-agent-tools-coder: Enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,900+ (2+ TB) popular github repos using large and small ai models to create reuseable: json, markdown and parquet files for local-first tool-calling models. GitHub - progapandist/stripeek: A local TUI proxy for real-time Stripe API debugging, built for navigating complex payloads fast. GitHub - sir1st/hermes-desktop: All-in-one cross-platform desktop app for Hermes Agent — bundles Python + hermes-agent + hermes-web-ui GitHub - astefanutti/shaderbang: Shebang for Shaders Show HN: Generate Claude Code Workflows using Spec Driven Development approach GitHub - nixys/nxs-universal-chart: The Helm chart you can use to install any of your applications into Kubernetes/OpenShift Show HN: AI agents for UK GDAD PCF roles and their skills The Two Pillars: Mixer Mode and Meta-Software in the Reorganization of Software Work After AI GitHub - JaiCode08/teleport-env What 1,000+ Harness Experiments Taught Me About Self-Improving Agents Show HN: Liiists, a Markdown-first, iOS and CLI list app SwiperTab – Get this Extension for 🦊 Firefox (en-US) GitHub - kouhxp/fftext: Summarize, explain, fact-check, or translate any text, URL, or file. No GPU. No cloud. One command GitHub - sweetpad-dev/sweetpad: Develop Swift/iOS projects using VSCode GitHub - dogmaticdev/IRON: IRON a.k.a. Intermediate Representation Object Notation is a Interpreter/Database that is used to create Programming Languages. GitHub - sjhalani7/vaen: Package your AI coding harness into a portable .agent file, and share it across repos, teams, & the community without ever having to copy-paste instructions, skills, MCP config, or secrets. Show HN: Gandalf the Grader Show HN: Citadeld – replay any CI failure locally from a single file GitHub - tdortman/cuSBF: High-Performance GPU Super Bloom Filter coral-ai/claude-code-token-xray at main · Coral-Bricks-AI/coral-ai GitHub - ulyssestenn/funes: Funes is a Git-based framework for LLM-managed knowledge work: an AI Librarian ingests raw sources, builds an interlinked Markdown knowledge base, and uses it to produce cited reports, analyses, and other outputs. GitHub - ThatXliner/gah: Git Add Hunk, built for agents to use GitHub - harmont-dev/harmont-cli: Command-line client for the Harmont CI platform GitHub - brooksmcmillin/mcp-authflow: OAuth 2.0 Authorization Server framework for MCP servers GitHub - javaid-codes/audit-supply-chain-agents GitHub - amorey/gochan: A small library of common channel architectures for Go, inspired by Rust GitHub - arifozgun/OpenGem: Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility in 1 file GitHub - Pranesh950/BioPetals: 🌸 Run BIOxAI models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading GitHub - cnguyen14/bounty-doctor: Diagnose a GitHub bounty issue before you waste hours: detects honeypot scam repos, AI-bot attempt swarms, and stale contests. Show HN: CoreMCP – MCP Server for On-Prem DBs Show HN: KittyHTML – Render HTML/CSS as an inline image in your terminal GitHub - bingud/filemat: Web-based file manager Show HN: TruthLens – Free multi-signal deepfake image detector GitHub - apexlocal-jz/claude-usage-tray: Windows system-tray app showing your Claude Code rate-limit usage at a glance. Zero deps, ~300 lines of PowerShell. Cross-IDE (works regardless of VS Code, Cursor, plain terminal). Release v0.1.2.1 · kouhxp/yapsnap GitHub - noopolis/moltnet: Self-hostable chat network for AI agents. Pre-built bridges for Claude Code, Codex, and the Claws. Rooms, DMs, history. No Slack bots, no Matrix, no glue code. GitHub - tamerh/enju: Coordinating Humans, AI Agents, and Compute as Peers on a Shared Workflow Graph Show HN: Continuity-auth – Respect-weighted rate limits for the open web GitHub - luml-ai/luml: AI lifecycle platform where engineers and agents track experiments, train models, and ship to production. GitHub - mrdanielcasper/CoreTex: A UNIX-inspired, biomimetic, flat-file AI harness and knowledge engine. GitHub - clemg/pierre-github: Pierre's diffs.com and trees.software for Github GitHub - lyriks-io/unspaghettit: Behavior-driven AI development without prompt spaghetti. GitHub - sofumel/claude-handoff-revive: Resume Claude Code work after rate/usage/context limits without replaying the prior transcript. Auto-saves at 90%/95% usage. Plugin-installable, 10 languages. GitHub - dotexorg/saferpc: Typed, end-to-end encrypted RPC over any bidirectional channel. GitHub - BeeZeeAgent/beezee: Agent harness orchestration Legato Next.js Boilerplate for Internal Tools · CoreUI GitHub - clark-labs-inc/clark-hash: Clark Hash, 32x smaller searchable sketches for embeddings GitHub - ZeroPointRepo/youtube-mcp: The fastest YouTube transcript + YouTube search MCP for AI agents. Try for free. Typing Mastery — climb toward 100+ WPM, deliberately GitHub - Andebugulin/Awareen GitHub - fayzan123/claude-workflow-composer: Visual desktop app for composing multi-agent coding workflows. Drag agents, attach skills and MCPs, wire handoffs, export to .claude/ GitHub - StackOneHQ/stack-nudge We hardened an LLM agent. Each defense we added made it more exploitable. GitHub - alkait/WhatsKept: Agent-queryable WhatsApp history from an iOS backup — a single Go binary. GitHub - octelium/cordium: Open-source, general-purpose sandbox platform for devs and AI agents that provides identity-based secure access to infrastructure without credentials. GitHub - scosman/videowright: Build animated explainer videos with your coding agent GitHub - dipankar/dscode: The code editor you can take apart. GitHub - zoharbabin/web-researcher-mcp: MCP server (Go) for AI assistants: web search, content extraction, academic/patent/news research. Multi-provider routing, 4-tier scraping, search lenses. Works with Claude, Cursor, and any MCP client. GitHub - scanaislop/aislop: Catch the slop AI coding agents leave in your code: narrative comments, swallowed exceptions, as-any casts, dead code, oversized functions. 50+ rules across 7 languages (TypeScript, JavaScript, Python, Go, Rust, Ruby, PHP). Sub-second, deterministic, no LLM at runtime. MIT-licensed. GitHub - kouhxp/cheap-im: CPU-only voice agent approximating Thinking Machines' Interaction Models demo GitHub - unprovable/OrchidMantis: Orchid Mantis — standalone framework for Zero-Knowledge Proofs of eXploit (ZKPoX). GitHub - TangibleResearch/Halgorithem: A Algo designed to detect AI Hallucitions GitHub - CarpseDeam/Aura-IDE: An AI coding harness that shaped itself - Planner/Worker agents, repo awareness, surgical edits, validation, recovery, and safe diff approvals. GitHub - chojs23/concord: A feature-rich TUI client for Discord GitHub - aerf-spec/aerf: Agent Evidence Receipt Format (AERF) — an open specification for tamper-evident, independently verifiable records of AI agent actions. GitHub - Jwrede/tokentoll: Catch LLM cost changes in code review. Infracost for LLM spend. GitHub - samchon/ttsc: A `typescript-go` toolchain for compiler-powered plugins and type-safe execution + 500x faster lint integrated into compiler GitHub - Higangssh/homebutler: 🏠 Manage your homelab from chat. Single binary, zero dependencies. GitHub - olalie/tapmap: See where your computer connects and what stands out on a live world map. GitHub - Diplomat-ai/diplomat-agent: What can your AI agent do to the real world? Scan your code. See which tool calls have zero checks GitHub - Bajusz15/beacon: Open-source agent for secure remote access, monitoring, and deploys across home-lab and self-hosted machines like Raspberry Pi, N100, or any Linux server. Open web based TTY or tunnel Home Assistant and other local services securely without opening ports. BigTech AI News - Chrome 应用商店 GitHub - vinhnx/VTCode: VT Code is an open-source coding agent with LLM-native code understanding and robust shell safety. Supports multiple LLM providers with automatic failover and efficient context management. GitHub - Lumen-Labs/brainapi2: BrainAPI is a knowledge graph–powered AI memory layer that transforms unstructured data into structured knowledge, enabling intelligent search, recommendations, and contextual memory for AI agents and applications. GitHub - familiar-software/familiar: Let AI watch you work. Familiar lets your AI update its memory, skills, and knowledge by watching your screen. make sidebar/address bar rounded corner toggleable
GitHub - lolu1032/pantheon-skills: Two Claude Code skills that run a coding task through a multi-agent harness — plan → N parallel implementations → adversarial verification → judge. pantheon (Claude-only), pantheon-x (GPT-5.5 cross-model verify). MIT.
lolu1032 · 2026-06-15 · via Show HN

Two Claude Code skills that run a hard coding task through a multi-agent harness instead of a single model pass: plan → N parallel implementations → adversarial verification → judge. The point isn't a smarter model — it's that a second (and third) implementation, plus an independent reviewer whose job is to break the result, catches bugs a single pass ships green.

It's a packaging of well-worn techniques — best-of-N sampling, tool-integrated self-correction, and LLM-as-judge / adversarial verification — wired into one /pantheon command so you don't reassemble them by hand each time. This is scaffolding around the model, not a change to it: it won't rescue a task the model fundamentally can't reason about, but it reliably tightens correctness on coding work whose answer you can express as tests.

The harness runs a deterministic pipeline:

Plan ──▶ Implement (×N parallel) ──▶ Verify (adversarial ×V) ──▶ Synthesize
 │            │ each self-corrects            │ try to BREAK each      │ judge picks winner
 1 planner    │ against its own tests (T1)    │ green build            │ + grafts best ideas
              N builders                       reviewers
  • Plan — derive a tight spec, a test plan that defines correctness, and N distinct strategies (before any code).
  • Implement — N builders implement different strategies in parallel; each runs its own tests and self-corrects on failure (tool-integrated self-verification, up to 5 iterations).
  • Verify — independent adversarial reviewers try to break each green build; a build refuted by a majority is dropped.
  • Synthesize — a judge picks the winner and lists superior ideas worth grafting from the runners-up.

The value: a build can pass its own tests yet still be wrong. The adversarial layer catches defects the self-written tests miss, instead of rubber-stamping a green build.

The two skills

Skill Adversarial verifier Requirements
pantheon Claude itself (independent agents) Paid Claude Code plan + Workflows (see below)
pantheon-x GPT-5.5 via Codex plugin (cross-model) Above + OpenAI Codex plugin (codex:codex-rescue)

pantheon-x is the stronger setting: the implementation written by Claude is attacked by a different model, which shrinks single-model blind spots (the same mistake slipping past a same-model verifier). If you don't have Codex/GPT-5.5, use pantheon.

Both skills share the same harness (pantheon-class.js); they differ only in the crossModelVerify flag.

Requirements

These skills drive Claude Code's Workflow orchestration engine, so a stock/Free setup is not enough:

  • Claude Code ≥ v2.1.154 on a paid plan — Pro, Max, Team, or Enterprise (also Bedrock / Vertex / Foundry). Not available on the Free tier.
  • On Pro, enable it once: /config → turn on Dynamic workflows.
  • pantheon-x only: the cross-model verifier runs as the codex:codex-rescue subagent, which ships in OpenAI's Codex pluginnot stock Claude Code. A logged-in codex CLI alone does not register it. Install the plugin:
    /plugin marketplace add openai/codex-plugin-cc
    /plugin install codex@openai-codex
    
    plus a ChatGPT subscription (or OPENAI_API_KEY) and the codex CLI on PATH. If codex:codex-rescue isn't installed, use pantheon insteadpantheon-x would otherwise silently skip the adversarial pass and pass every build.

Skills and subagents themselves are stock Claude Code features; no extra setup beyond the above.

Install

Clone into your Claude Code skills directory (personal install):

git clone https://github.com/lolu1032/pantheon-skills.git
cp -R pantheon-skills/pantheon       ~/.claude/skills/pantheon
cp -R pantheon-skills/pantheon-x     ~/.claude/skills/pantheon-x

Or for a single project, copy into <project>/.claude/skills/.

Usage

In Claude Code:

/pantheon    <a hard implementation task whose correctness is testable>
/pantheon-x  <same, but GPT-5.5 does the adversarial verification>

Example:

/pantheon Add idempotency-key handling to the payments module so concurrent requests can't double-charge. Tests: pnpm test (vitest)

Claude collects the parameters (task, workdir, lang + test command, variants, verifiers) and launches the harness as a background Workflow, then reports: per-variant test results, which builds the adversarial pass broke, and the final winner with its rationale and grafting suggestions.

Parameters

arg default notes
task one-paragraph requirement + acceptance criteria (expressible as tests)
workdir /tmp/pantheon-<name> absolute path; a real repo or a scratch dir
lang Python/unittest language + the exact test command for your stack
variants 3 bump to 5 for harder problems
verifiers 2 bump to 3 to be stricter (majority refutation drops a build)
crossModelVerify false (pantheon) / true (pantheon-x) route adversarial verify to GPT-5.5/Codex

Cost & scope

  • Not a daemon. Each invocation runs once to completion and exits — zero cost when idle.
  • A run spends real tokens. A representative run is ~11 subagents and a few hundred K to ~1M tokens end-to-end, ~6–10 min wall-clock; heavier settings (variants=5, verifiers=3, cross-model) cost more. On Pro/Max it draws from your usage quota; on metered API access, budget a few dollars per run and up. Route only the hardest 10–20% of tasks here — use plain Opus for the rest.
  • This buys correctness on testable work, not raw model intelligence. If a task isn't expressible as tests, the adversarial layer has little to grip and the overhead isn't worth it.
  • Coding/agentic productivity only. Not a tool for bypassing safety gates (cybersecurity/biology capability restrictions).

FAQ

Isn't this just a prompt wrapper? There's no model change — it's orchestration, yes. The non-trivial part is the adversarial step: an independent agent (a different model in pantheon-x) whose job is to break a build rather than confirm it. That's what catches defects the builder's own green tests rubber-stamp. The value is the harness shape, not a secret prompt.

Do you have benchmarks vs. plain Opus? No formal benchmark yet — treat the description as mechanism, not a measured delta. The value is in the adversarial step: a build can pass its own tests and still be wrong, and an independent reviewer catches what the self-written tests rubber-stamp. If you run a head-to-head, I'd genuinely like to see the numbers.

What does a run cost? A few hundred K to ~1M tokens and ~6–10 min at default settings; more for variants=5 / verifiers=3 / cross-model. It's meant for the hardest 10–20% of tasks, not everyday edits. See Cost & scope.

It says "Workflow tool not found" / nothing happens. You're likely on the Free tier, or haven't enabled workflows. See Requirements — needs a paid plan and, on Pro, /configDynamic workflows.

Why route verification to GPT-5.5 / another vendor's model? Same-model verifiers share blind spots — a mistake the builder makes, a same-model reviewer tends to miss too. A different model is a cheap way to break that correlation. It's optional: pantheon runs Claude-on-Claude and still helps.

Status

Solo project, as-is, best-effort. Issues and PRs are welcome, but maintenance comes with no guarantees or SLA — I may not get to everything. It's MIT-licensed, so forking is a first-class option if you want to take it further.

License

MIT