惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Vessel Ops SSH in 2026: Why Every Developer Should Know It Cold AI Can Generate Interfaces on the Fly. But Users Still Need Orientation. How We Learned That Most Resume Rejections Happen Before Humans See Your CV How I Prepared for CKA: Resources, Labs, and Strategy That Worked for Me Stop Flying Blind: We Built an LLM Evaluation Framework That Works Across 17+ Agent Frameworks The Misleading "User is not authorized to access connection" Error in AWS CodeBuild — and Why Your IAM Policy Looks Fine I Resurrected a Dead F1 Project and Accidentally Built a Race Intelligence OS Remix Mini PC: After a Year of Dead Ends, the eMMC Finally Talks Not All Games Are Equal: The Real Difference Between a Trap and a Tool How to add Peppol e-invoicing to your SaaS without making it your team's problem I Built a Hermes Agent to Tell Me Which Hackathons to Enter. It Told Me to Enter This One. The Five Hooks That Change How You Ship With Claude Code Powering Your Progress: Building Robust Solutions with Laravel I built a self-hosted CI/CD platform with persistent queue, encrypted secrets, and rollback UI — here's what I learned Antigravity 2.0 and the $1,000 OS: Why "Agent-First" Feels Like the Direction I've Been Building Toward Anyway I built an AI PR-triage agent in 30 lines of Markdown Core Web Vitals from 74 to 91: A Real Tax Practitioner Site Rebuild I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened. Beyond the Loop: Why Monolithic AI Agents Fail and How to Build a Microkernel Architecture The Hidden Tax of AI-Assisted Development (And How I Fixed It) I Ditched Cloud LLMs for Gemma 4 4B: A DevOps Engineer's 48-Hour Reality Check Building a Schema.org @graph That Validates on the First Try The "Lift and Shift" Trap: Why Your Integration Layer Needs More Than Just a Cloud Address All 7 OSI Layers Explained with Real-World Analogies Antigravity 2.0 in one day: the four shells and what each is good for Self-Hosting Google Fonts with size-adjust: Zero CLS Web Font Swap The Multi-Provider LLM Problem: Why “One API” Is Not Enough How I indexed 69,000 Claude Code skills (and what I learned doing it) RememberMe CareGrid: Local Gemma 4 for dementia memory and safety Google Is Killing Gemini CLI on June 18. Here Is What to Do Before Then Do Domínio ao Deploy: Hospedando Arquivos de Deep Links no Cloudflare Pages (Parte 7.1) Running Gemma 4 26B on an Old GTX 1080 with llama.cpp Devlog 1: I tried building an SNES game with the super FX chip Why Gemma 4 Feels Like an Important Moment for AI Developers✨ From Zero and Confused, This Is How I Started Learning to Code I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founder’s Best Friend MyErp Architecture Series - #02 Cellular Architecture: Mapping Biology to Software Systems NodeJS vs Bun vs Go 🌍 RTL Arabic Style UI How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4 📝 Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders 터미널 AI 에이전트 구축 (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma Why I Built a Privacy-First Dev Toolkit GAS Input Tags: Ability Activation Without Hardcoded Bindings AI Legal Document Advisor Supported By Gemm 4 Model Building Convertify in Public Week 10: PDF Cluster + Blog Launch CureNet AI: Decentralized Health Intelligence for India, Powered by Gemma 4 and ABHA Standardization When Open-Weights AI Meets a Broken Healthcare System: Deploying Gemma 4 in Rural India V.A.L.I.D. Google I/O 2026: The Year Google Stopped Building AI Assistants and Started Shipping AI Engineers Bondmap: AI-Powered Relationship Network That Maps How You're Connected to Everyone Using Gemma 4 Gemma 4 challenge inspired me to build my first app! 96. LoRA: Fine-Tune a Billion-Parameter Model on a Laptop From a Student Who Used CircuitVerse to a GSoC Contributor — My Community Bonding Story How Bf-Tree Keeps Mini-Pages Small, Hot, and Cheap to Evict I asked Claude to explain the chip war and ended up understanding modern geopolitics differently Stop Manually Checking for Server Updates: Automate With Email Notifications Nostalgia Meets Cybersecurity: Spotting Modern Scams in a Retro OS Simulator - Forward or Fraud CRACKING CODING INTERVIEW From Python to Production Pipeline :A Practical guide to Apache Airflow Antigravity 2.0: Google Just Changed What It Means to Be an Engineer I Built a Free Sticker Maker Because Every Other One Hid the Export How I bypassed Blazor WebAssembly's Virtual DOM using raw WASM pointers Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable The Zero-Budget Memory Setup Behind My AI Agent Workflow No database. No framework. Just files, startup order, correction logs, and discipline. I Built an AI Second Brain with Gemma 4 The Most Exciting Google I/O 2026 Announcement for Me: HTML-in-Canvas CrisisLens: Compressing Disaster Scenes into 200-Byte Emergency Payloads with Gemma 4 I'm 15 and I built a todo app with Telegram Stars payments — only legal way for me to monetize before turning 18 Crypto Branding After the Token Launch Building an on-chain alerts bot in Python without any blockchain library FinePrint — An AI Pocket Lawyer That Decodes Predatory Contracts Using Gemma 4
Audit AI-Generated PRs Before You Merge Them (Swarm Orchestrator 10.3.0)
Brad Kinnard · 2026-05-25 · via DEV Community

If you let Claude Code, Cursor, Devin, Aider, Copilot, or any other coding agent open PRs against your repo, you already know the problem. The diff looks fine on a fast read. CI is green. You merge it. A week later you find the test that "passed" got deleted, or the error handling is a silent catch {}, or the "fix" was a comment swap that never touched the bug.

Swarm Orchestrator looks at those PRs and flags the suspicious bits before you click merge.

What it is

A CLI and a GitHub Action. Open source. Node 20 or later. You point it at a PR (or a local diff) and it scores the patch against a set of cheat-pattern detectors. It posts a comment back to the PR with what it found and why.

swarm audit moonrunnerkc/swarm-orchestrator#42

Enter fullscreen mode Exit fullscreen mode

That's the whole interface for most people.

What it does

The default detector set has four checks, all aimed at patterns AI agents actually produce on real PRs:

  • error-swallow: a new empty or comment-only catch block in non-test code.
  • mock-of-hallucination: a jest.mock or vi.mock against a module that doesn't exist anywhere in the repo.
  • no-op-fix: tests changed without source, or source changed without tests, when the diff claims to fix something.
  • fake-refactor: an exported symbol renamed in source, with no caller in the diff updated.

Six more detectors live behind --detectors experimental for shadow runs. They're not scored well enough on real PRs to be on by default, and the README says so.

Every finding renders with its measured precision number inline, so a reviewer sees the false-positive rate every time the bot speaks.

If you need compliance artifacts, --emit-aibom cyclonedx-ml writes a CycloneDX 1.6 ML-BOM and an SPDX 3.0 AI-Profile per audit. That covers the EU AI Act Annex IV and CISA SBOM-for-AI minimums without bolting on a separate vendor.

Who it's for

Teams that let AI agents open PRs and want a second pair of eyes that runs in CI, costs nothing per call, and produces a deterministic comment instead of vibes. Also useful for procurement and security folks who need an AI-BOM next to their SBOM and don't want another tool in the chain.

If you have one developer eyeballing every line of every AI PR by hand, you probably don't need this yet. If you have ten agents pushing diffs to a queue at 2am, you do.

What's new in 10.3.0

Four things:

  1. no-op-fix got a v2.0 with a gated LLM judge. The judge is off by default and only fires when you set --enable-llm-judge (or SWARM_AUDIT_LLM_JUDGE=1) and have an Anthropic key. Verdicts are content-addressed and cached, so the same diff and title always gets the same answer. The model id is pinned in the ledger so replay stays deterministic.
  2. --shadow-output <path>. One JSON file per audit with detector verdicts, judge call count, and the rendered comment. Drops into a directory you can jq later. The existing --shadow <repo> per-repo rollup still works.
  3. Public leaderboard on GitHub Pages. Fetches the real-corpus score snapshot and renders precision, recall, F1, and a sortable per-detector table. No build step, no CDN, just an HTML page and one JS file: moonrunnerkc.github.io/swarm-orchestrator/docs/leaderboard/.
  4. Real-corpus headline rescored against the v2.0 detectors. F1 moved from 0.109 (P 0.067, R 0.300) to 0.167 (P 0.100, R 0.500). mock-of-hallucination picked up two true positives the v1 shape missed.

The honest part

The real-corpus F1 is 0.167 across 205 AI-labeled PRs (10 broken, 195 clean, eight agent vendors). Precision is 0.100. Recall is 0.500.

That precision number is exactly why the default mode is advise and not gate. Most flags will be false positives. The tool is calibrated to be useful as a reviewer-assist signal, not a merge blocker. If you want it to block, opt in: --mode gate.

The 205-PR corpus is currently labeled by an AI judge with "pending human review" stamped on every entry. That's the largest credibility hole in the project and the next milestone closes it. The labeling rubric, the kappa script, and the labels-v2 scaffold already live in the repo.

Don't read this as "ship this into your release gate today." Read it as "here's a tool you can run in shadow mode, look at what it flags, and decide for yourself if those flags are useful."

Try it

git clone https://github.com/moonrunnerkc/swarm-orchestrator.git
cd swarm-orchestrator
npm install
npm run build
npm link

# audit a PR (advisory, never blocks)
GITHUB_TOKEN=... swarm audit owner/repo#PR

Enter fullscreen mode Exit fullscreen mode

Or wire it into a workflow with uses: moonrunnerkc/swarm-orchestrator@main and audit-mode: true.

Sources: