惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Project Zero
Project Zero
F
Fortinet All Blogs
Recent Announcements
Recent Announcements
云风的 BLOG
云风的 BLOG
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
S
SegmentFault 最新的问题
Blog — PlanetScale
Blog — PlanetScale
T
Tailwind CSS Blog
WordPress大学
WordPress大学
Engineering at Meta
Engineering at Meta
S
Schneier on Security
N
News and Events Feed by Topic
N
News | PayPal Newsroom
H
Help Net Security
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
The Exploit Database - CXSecurity.com
Attack and Defense Labs
Attack and Defense Labs
博客园 - Franky
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
A
About on SuperTechFans
AWS News Blog
AWS News Blog
S
Secure Thoughts
The Cloudflare Blog
Hugging Face - Blog
Hugging Face - Blog
爱范儿
爱范儿
C
Cybersecurity and Infrastructure Security Agency CISA
V2EX - 技术
V2EX - 技术
Recorded Future
Recorded Future
Microsoft Azure Blog
Microsoft Azure Blog
博客园_首页
MyScale Blog
MyScale Blog
Martin Fowler
Martin Fowler
Help Net Security
Help Net Security
人人都是产品经理
人人都是产品经理
Latest news
Latest news
C
Cyber Attacks, Cyber Crime and Cyber Security
大猫的无限游戏
大猫的无限游戏
The Last Watchdog
The Last Watchdog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
月光博客
月光博客
H
Hacker News: Front Page
P
Proofpoint News Feed
N
News and Events Feed by Topic
H
Heimdal Security Blog
L
Lohrmann on Cybersecurity
有赞技术团队
有赞技术团队
L
LangChain Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog

Show HN

CSP Radar GitHub - awebai/aweb-team-coord-worktrees: An aweb team template for a minimum team with a permanent coordinator and worktrees with local developers. GitHub - fujibee/agmsg GitHub - lucastononro/notify: 100% local, free, offline attention skill for Claude Code: plays a sound and speaks a short status update when a long task finishes, blocks, or needs a decision. GitHub - sebastianwessel/skills: AI Skills tivatdoar / workout-to-work · GitLab GitHub - enumura1/py-sql-cleaner: Find, format, and safely extract embedded SQL from Python files. GitHub - intent-bench/intent-bench: Intent fulfillment benchmark for agentic AI engineering GitHub - steveking-gh/firmion: Firmion is DSL and engine for firmware image generation. GitHub - villagesql/villagesql-skills: Agent skills for VillageSQL - gemini-cli-extension; claude-code-plugin GitHub - 0gsd/enough: a personal language system for planning, writing, and translation. GitHub - Kaelio/ktx: ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer GitHub - ThatXliner/xtras: Xliner's Claude Code Skills GitHub - flightdeckhq/flightdeck: Observability and control plane for AI agents. GitHub - search-router/simple-search: Open-source reference app on top of the Search Router API: FastAPI + Jinja metasearch service with pluggable backends, deterministic mocks (no API key needed), RTL UI, Redis cache, and a demo ads cabinet. CSP Radar GitHub - Light-Heart-Labs/DreamServer: Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. GitHub - Diplomat-ai/diplomat-agent-ts: What can your TypeScript AI agent do to the real world? Scan your code. See which tool calls have zero checks Code Block Selector - Visual Studio Marketplace Prometheus dependency graph — interactive showcase | Riftmap Show HN: I made a vi-like modal keyboard plugin for Figma GitHub - run-llama/liteparse: A fast, helpful, and open-source document parser GitHub - dalemyers/Roar: A macOS CLI tool for notifications GitHub - district-solutions/open-agent-tools-coder: Enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,900+ (2+ TB) popular github repos using large and small ai models to create reuseable: json, markdown and parquet files for local-first tool-calling models. GitHub - progapandist/stripeek: A local TUI proxy for real-time Stripe API debugging, built for navigating complex payloads fast. GitHub - sir1st/hermes-desktop: All-in-one cross-platform desktop app for Hermes Agent — bundles Python + hermes-agent + hermes-web-ui GitHub - astefanutti/shaderbang: Shebang for Shaders Show HN: Generate Claude Code Workflows using Spec Driven Development approach GitHub - nixys/nxs-universal-chart: The Helm chart you can use to install any of your applications into Kubernetes/OpenShift Show HN: AI agents for UK GDAD PCF roles and their skills The Two Pillars: Mixer Mode and Meta-Software in the Reorganization of Software Work After AI GitHub - JaiCode08/teleport-env What 1,000+ Harness Experiments Taught Me About Self-Improving Agents Show HN: Liiists, a Markdown-first, iOS and CLI list app SwiperTab – Get this Extension for 🦊 Firefox (en-US) GitHub - kouhxp/fftext: Summarize, explain, fact-check, or translate any text, URL, or file. No GPU. No cloud. One command GitHub - sweetpad-dev/sweetpad: Develop Swift/iOS projects using VSCode GitHub - dogmaticdev/IRON: IRON a.k.a. Intermediate Representation Object Notation is a Interpreter/Database that is used to create Programming Languages. GitHub - sjhalani7/vaen: Package your AI coding harness into a portable .agent file, and share it across repos, teams, & the community without ever having to copy-paste instructions, skills, MCP config, or secrets. Show HN: Gandalf the Grader Show HN: Citadeld – replay any CI failure locally from a single file GitHub - tdortman/cuSBF: High-Performance GPU Super Bloom Filter coral-ai/claude-code-token-xray at main · Coral-Bricks-AI/coral-ai GitHub - ulyssestenn/funes: Funes is a Git-based framework for LLM-managed knowledge work: an AI Librarian ingests raw sources, builds an interlinked Markdown knowledge base, and uses it to produce cited reports, analyses, and other outputs. GitHub - ThatXliner/gah: Git Add Hunk, built for agents to use GitHub - harmont-dev/harmont-cli: Command-line client for the Harmont CI platform GitHub - brooksmcmillin/mcp-authflow: OAuth 2.0 Authorization Server framework for MCP servers GitHub - javaid-codes/audit-supply-chain-agents GitHub - amorey/gochan: A small library of common channel architectures for Go, inspired by Rust GitHub - arifozgun/OpenGem: Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility in 1 file GitHub - Pranesh950/BioPetals: 🌸 Run BIOxAI models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading GitHub - cnguyen14/bounty-doctor: Diagnose a GitHub bounty issue before you waste hours: detects honeypot scam repos, AI-bot attempt swarms, and stale contests. Show HN: CoreMCP – MCP Server for On-Prem DBs Show HN: KittyHTML – Render HTML/CSS as an inline image in your terminal GitHub - bingud/filemat: Web-based file manager Show HN: TruthLens – Free multi-signal deepfake image detector GitHub - apexlocal-jz/claude-usage-tray: Windows system-tray app showing your Claude Code rate-limit usage at a glance. Zero deps, ~300 lines of PowerShell. Cross-IDE (works regardless of VS Code, Cursor, plain terminal). Release v0.1.2.1 · kouhxp/yapsnap GitHub - noopolis/moltnet: Self-hostable chat network for AI agents. Pre-built bridges for Claude Code, Codex, and the Claws. Rooms, DMs, history. No Slack bots, no Matrix, no glue code. GitHub - tamerh/enju: Coordinating Humans, AI Agents, and Compute as Peers on a Shared Workflow Graph Show HN: Continuity-auth – Respect-weighted rate limits for the open web GitHub - luml-ai/luml: AI lifecycle platform where engineers and agents track experiments, train models, and ship to production. GitHub - mrdanielcasper/CoreTex: A UNIX-inspired, biomimetic, flat-file AI harness and knowledge engine. GitHub - clemg/pierre-github: Pierre's diffs.com and trees.software for Github GitHub - lyriks-io/unspaghettit: Behavior-driven AI development without prompt spaghetti. GitHub - sofumel/claude-handoff-revive: Resume Claude Code work after rate/usage/context limits without replaying the prior transcript. Auto-saves at 90%/95% usage. Plugin-installable, 10 languages. GitHub - dotexorg/saferpc: Typed, end-to-end encrypted RPC over any bidirectional channel. GitHub - BeeZeeAgent/beezee: Agent harness orchestration Legato Next.js Boilerplate for Internal Tools · CoreUI GitHub - clark-labs-inc/clark-hash: Clark Hash, 32x smaller searchable sketches for embeddings GitHub - ZeroPointRepo/youtube-mcp: The fastest YouTube transcript + YouTube search MCP for AI agents. Try for free. Typing Mastery — climb toward 100+ WPM, deliberately GitHub - Andebugulin/Awareen GitHub - fayzan123/claude-workflow-composer: Visual desktop app for composing multi-agent coding workflows. Drag agents, attach skills and MCPs, wire handoffs, export to .claude/ GitHub - StackOneHQ/stack-nudge We hardened an LLM agent. Each defense we added made it more exploitable. GitHub - alkait/WhatsKept: Agent-queryable WhatsApp history from an iOS backup — a single Go binary. GitHub - octelium/cordium: Open-source, general-purpose sandbox platform for devs and AI agents that provides identity-based secure access to infrastructure without credentials. GitHub - scosman/videowright: Build animated explainer videos with your coding agent GitHub - dipankar/dscode: The code editor you can take apart. GitHub - zoharbabin/web-researcher-mcp: MCP server (Go) for AI assistants: web search, content extraction, academic/patent/news research. Multi-provider routing, 4-tier scraping, search lenses. Works with Claude, Cursor, and any MCP client. GitHub - scanaislop/aislop: Catch the slop AI coding agents leave in your code: narrative comments, swallowed exceptions, as-any casts, dead code, oversized functions. 50+ rules across 7 languages (TypeScript, JavaScript, Python, Go, Rust, Ruby, PHP). Sub-second, deterministic, no LLM at runtime. MIT-licensed. GitHub - kouhxp/cheap-im: CPU-only voice agent approximating Thinking Machines' Interaction Models demo GitHub - unprovable/OrchidMantis: Orchid Mantis — standalone framework for Zero-Knowledge Proofs of eXploit (ZKPoX). GitHub - TangibleResearch/Halgorithem: A Algo designed to detect AI Hallucitions GitHub - CarpseDeam/Aura-IDE: An AI coding harness that shaped itself - Planner/Worker agents, repo awareness, surgical edits, validation, recovery, and safe diff approvals. GitHub - chojs23/concord: A feature-rich TUI client for Discord GitHub - aerf-spec/aerf: Agent Evidence Receipt Format (AERF) — an open specification for tamper-evident, independently verifiable records of AI agent actions. GitHub - Jwrede/tokentoll: Catch LLM cost changes in code review. Infracost for LLM spend. GitHub - samchon/ttsc: A `typescript-go` toolchain for compiler-powered plugins and type-safe execution + 500x faster lint integrated into compiler GitHub - Higangssh/homebutler: 🏠 Manage your homelab from chat. Single binary, zero dependencies. GitHub - olalie/tapmap: See where your computer connects and what stands out on a live world map. GitHub - Diplomat-ai/diplomat-agent: What can your AI agent do to the real world? Scan your code. See which tool calls have zero checks GitHub - Bajusz15/beacon: Open-source agent for secure remote access, monitoring, and deploys across home-lab and self-hosted machines like Raspberry Pi, N100, or any Linux server. Open web based TTY or tunnel Home Assistant and other local services securely without opening ports. BigTech AI News - Chrome 应用商店 GitHub - vinhnx/VTCode: VT Code is an open-source coding agent with LLM-native code understanding and robust shell safety. Supports multiple LLM providers with automatic failover and efficient context management. GitHub - Lumen-Labs/brainapi2: BrainAPI is a knowledge graph–powered AI memory layer that transforms unstructured data into structured knowledge, enabling intelligent search, recommendations, and contextual memory for AI agents and applications. GitHub - familiar-software/familiar: Let AI watch you work. Familiar lets your AI update its memory, skills, and knowledge by watching your screen. make sidebar/address bar rounded corner toggleable
GitHub - frane/vibesurfer
frb · 2026-06-19 · via Show HN

A real browser for your local AI agent.

ci engine-tests release license

Claude Code using vibesurfer

Why

I wanted agents to test web apps via the browser. Everything I tried (Playwright, Puppeteer, anything else that wraps CDP) was too heavy and too unstable. CDP drops sessions. Playwright crashes on long runs. Chrome gets fatter every release. None of that is the actual problem though. CDP and Chrome were designed for humans staring at DevTools. They were never designed for an agent stuck in a while loop.

An agent pays per token. It blocks per response. It can't deal with the event firehose, and a 4kb DOM dump on every read burns the context budget fast. The Hacker News front page through Playwright is about 2000 input tokens before the agent has done anything. Through vibesurfer it's around 50.

vibesurfer is a native browser daemon in Rust. Reads return state tokens and tree deltas instead of the full DOM. Writes check the token. If anything moved between the read and the write, the call fails and the agent re-reads instead of clicking on a stale page. There are three real engines underneath: WKWebView on macOS, WebKitGTK on Linux, WebView2 on Windows. The protocol on top is text and line-oriented.

When you actually need pixels there's vs capture for screenshots, vs viewport to switch between mobile and desktop layouts, and vs layout to get bounding boxes. But text comes first.

Install

Homebrew (macOS, Linux):

brew tap frane/tap && brew install vibesurfer

curl:

curl -sSL https://raw.githubusercontent.com/frane/vibesurfer/main/install.sh | sh

Cargo:

From source:

git clone https://github.com/frane/vibesurfer && cd vibesurfer
cargo install --path crates/vs-cli

Linux needs WebKitGTK 6. Windows needs the WebView2 runtime (already on Windows 11, available for Windows 10 from Microsoft).

Wire it into your agent

Two integration paths, and they're independent. You can install either or both:

  • Skill: drop SKILL.md into the agent's skills directory. The agent reads it as context and calls the vs binary directly through whatever shell it has. Use this for any agent that runs Bash but doesn't speak MCP.
  • MCP: register vs mcp as an MCP server. The agent calls vibesurfer primitives as MCP tools over JSON-RPC, no shell required. Use this for agents with native MCP support.

The auto-installer does both where supported. After vs is on your PATH:

It detects Claude Desktop, Claude Code, Cursor, Codex CLI, Gemini CLI, and OpenClaw, then writes the SKILL.md plus the MCP entry into each one. Agents that only support one of the two get only the relevant piece. Re-run after upgrading.

Doing it by hand

For the skill path, copy skills/vibesurfer/SKILL.md from the repo into the agent's skills directory. For Claude-family agents that's typically ~/.claude/skills/vibesurfer/SKILL.md.

For the MCP path, add this block to the agent's MCP config (claude_desktop_config.json, .cursor/mcp.json, etc.):

{
  "mcpServers": {
    "vibesurfer": {
      "command": "vs",
      "args": ["mcp"]
    }
  }
}

Codex uses TOML with the same shape under [mcp_servers.vibesurfer]. The JSON form also sits at plugin/.mcp.json if you would rather copy it from the repo.

Per-agent shortcuts

Claude Code marketplace installs both surfaces from one command:

/plugin install frane/vibesurfer

Resolves .claude-plugin/marketplace.json at the repo root and plugin/.claude-plugin/plugin.json.

Gemini extension wires the MCP server plus the GEMINI.md context file:

gemini extensions install https://github.com/frane/vibesurfer

Reads gemini-extension.json at the repo root.

Not detected as automated

Modern anti-bot systems (Cloudflare, hCaptcha, reCAPTCHA, DataDome) gate first on event.isTrusted, then on movement timing, then on TLS/HTTP fingerprinting. vibesurfer's input dispatch is built to pass all three.

On macOS, vs act click and the coordinate primitives (vs click-at, vs hover-at, vs move-to, vs drag) route through native NSEvent mouseDown / mouseUp / mouseMoved on WKWebView. Each event carries isTrusted = true in JS, same as a real cursor click. The Bezier-pathed lead-in dispatched before every click reproduces Fitts-law arrival timing (digraph-derived control points, optional overshoot) so the visible motion looks like a human reaching the target rather than a teleport.

Since v0.1.11 the coordinate primitives are native on Linux and Windows too: XTest over x11rb on WebKitGTK (X11 / Xwayland; pure Wayland falls back to ENGINE_UNSUPPORTED), SendMouseInput on a WebView2 composition controller on Windows. All three engines emit isTrusted = true for cursor-primitive clicks. Ref-based vs act click is trusted on macOS only — on Linux and Windows it still dispatches through injected JS (isTrusted = false); use the coordinate primitives there for fingerprint-sensitive sites.

The walker also honors ARIA role="..." (Radix UI, Headless UI, Reach UI, every custom-div-as-button pattern), plus a tabindex heuristic for focusable divs/spans without a role. Modern React UIs surface as actionable refs without coordinate workarounds.

Short forms

Every primitive has a one-to-three-letter alias. Long forms exist for documentation; agent invocations should use the short form to save tokens.

Long Short Long Short
session-open so extract x
session-close sc mark m
open o annotate an
close c status st
view v log l
read r skill sk
act a capture cap
find f viewport vp
wait w layout lay
auth au inspect i

Frequent flags: --session= / -S, --full / -F, --since= / -s, --limit= / -n, --page= / -P, --json / -j. Inspect subcommands have one-or-two-letter aliases too (i co for inspect console, i n for network, i req for request, i e for eval, i s for storage, i scr for scripts, i src for script, i d for dom, i p for performance).

Both forms work everywhere. The integration tests assert that the wire request from a short form is byte-identical to the wire request from the long form.

Quickstart

$ vs so                                        # session-open
@0                                             # state token (16 hex chars; 0 means none yet)
s_019e08a7…                                    # session id

$ vs o https://example.com                     # open the URL
@0                                             # the open call doesn't carry a snapshot
p_019e08a7…                                    # page id

$ vs v p_019e08a7…                             # view (snapshot the a11y tree)
@44d01704049d6d31                              # state token
1 doc "Example Domain"                         # ref 1, document
  0 el ""                                      # nameless wrapper
    2 hd "Example Domain"                      # ref 2, heading
    3 p  "This domain is for use in…"          # ref 3, paragraph
    5 p  "Learn more"                          # ref 5, paragraph
      4 lnk "Learn more" click,focus           # ref 4, link, supported ops

A snapshot is a list of refs. Each ref is an integer that survives across snapshots, so the agent can act on ref 4 ten turns later without re-reading the whole page. The two-letter codes (hd, p, lnk, btn, tf, …) compress the role into a few bytes instead of an ARIA string. Labels are in quotes; the trailing tokens after a label list which vs act operations the element supports. About twenty role codes total, listed in docs/PROTOCOL.md.

$ vs a 4 click                                 # act: click ref 4
@<new-token>                                   # new token, page mutated
?nav                                           # warning: navigation occurred
… new tree …                                   # the act response carries deltas;
                                               # on navigation it re-baselines to a full tree

vs act is the only mutating primitive. It takes a ref and an operation (click, fill, scroll, key, submit, hover, focus) and requires the most recent state token. If the page mutated between read and write (a JS timer fired, a websocket pushed an update, anything), the call returns ! STALE_TOKEN and the agent re-reads. No silent stale clicks. After a successful act on the same page (no navigation), the response carries only the deltas (+ref for adds, -ref for removes, ~ref for attribute changes), so a click that adds one button costs ~20 bytes on the wire instead of the whole DOM.

$ vs st                                        # status
session  s_019e08a7…  pages=1
page     p_019e08a7…  url=https://www.iana.org/help/example-domains  token=…

Every primitive call writes one row to a SQLite audit log before it returns. vs status reads that log. So does vs log. Replay, debugging, and governance all collapse to SQL queries against ~/.vibesurfer/state.db. There is no separate event stream to subscribe to.

The daemon auto-spawns on first call. State, captures, and downloads live under ~/.vibesurfer/. The transport is an AF_UNIX socket on Unix (~/.vibesurfer/daemon.sock) and a Windows named pipe on Windows; either way, the CLI handles the difference.

29 wire primitives total — the 19 core primitives are specified in docs/PRIMITIVES.md; the later additions (vs_inspect, the four cursor primitives, prompt-input, and the pending queue) are documented in the bundled SKILL.md and the CHANGELOG. The full wire format with every sigil and edge case is in docs/PROTOCOL.md. The per-platform per-primitive verification matrix is in docs/REALITY_CHECK.md.

Configuration

Path / variable Purpose
~/.vibesurfer/state.db SQLite, holds sessions, audit, marks, auth blobs
~/.vibesurfer/daemon.sock (Unix) AF_UNIX socket the CLI talks to
Windows named pipe Same role on Windows; resolved automatically
~/.vibesurfer/captures/ Screenshots from vs capture
VS_CAPTURES_DIR Override the capture directory
VS_HOME Override the vibesurfer home directory
VS_DISABLE_INSPECTOR=1 Skip inspector hooks (testing only)
VS_DAEMON_BIN Override the binary used for daemon auto-spawn (tests)

Build from source

Requires Rust 1.85+. Platform-specific dependencies:

  • macOS (15+): nothing extra, links against system WebKit.
  • Linux: libwebkitgtk-6.0-dev, libgtk-4-dev, libsoup-3.0-dev.
  • Windows: WebView2 SDK pulled by webview2-com at build time; the WebView2 Runtime is required at run time.
git clone https://github.com/frane/vibesurfer && cd vibesurfer
cargo build --release

Run the test suite:

cargo test --workspace --lib --bins        # fast unit tests
cargo test --workspace                     # adds integration tests (real engine)

For Linux engine tests on a non-Linux host, use the Docker container. WebKitGTK 6's sandbox needs unprivileged user namespaces; the CI Linux job relaxes the AppArmor restriction with one sysctl on the bare runner, while the Docker fallback needs --privileged to do the same:

docker build -f Dockerfile.linux-test -t vs-test-linux .
docker run --rm --privileged -v "$PWD":/work vs-test-linux

See docs/DEVELOPMENT.md for the longer walkthrough.

The demo gif at the top of this README is a real interactive Claude Code session driving vibesurfer. To capture a fresh one, run docs/demo/record-claude.sh:

brew install asciinema agg
docs/demo/record-claude.sh         # writes docs/demo-claude.gif

The script enforces a TTY guard, isolates the demo home, and locks Claude to Bash so the agent must use the real vs binary (no MCP fallback, no built-in file tools). Each render is non-deterministic, since model output varies. The cached gif is committed so cloners and CI don't re-render.

Contributing

Issues and pull requests welcome. Open an issue first for anything beyond a small fix so we can discuss the approach. The codebase uses agented for transactional file edits during development; agented's workspace state is local-only (.agented/state.db) and is not committed.

Acknowledgments

Built on:

Protocol borrows from agented, an editor for AI agents.

License

Apache-2.0. See LICENSE.