惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
The Exploit Database - CXSecurity.com
博客园 - 叶小钗
aimingoo的专栏
aimingoo的专栏
N
Netflix TechBlog - Medium
T
The Blog of Author Tim Ferriss
MongoDB | Blog
MongoDB | Blog
Hugging Face - Blog
Hugging Face - Blog
The Cloudflare Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - Franky
GbyAI
GbyAI
Jina AI
Jina AI
S
SegmentFault 最新的问题
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
V
Visual Studio Blog
月光博客
月光博客
宝玉的分享
宝玉的分享
大猫的无限游戏
大猫的无限游戏
Recorded Future
Recorded Future
A
About on SuperTechFans
博客园 - 司徒正美
Microsoft Security Blog
Microsoft Security Blog
H
Help Net Security
P
Proofpoint News Feed
WordPress大学
WordPress大学
人人都是产品经理
人人都是产品经理
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
PCI Perspectives
PCI Perspectives
Latest news
Latest news
C
Cisco Blogs
小众软件
小众软件
L
LINUX DO - 热门话题
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
P
Privacy & Cybersecurity Law Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
S
Securelist
Recent Announcements
Recent Announcements
P
Palo Alto Networks Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Privacy International News Feed
Stack Overflow Blog
Stack Overflow Blog
T
Tenable Blog
Y
Y Combinator Blog
T
Threatpost
Simon Willison's Weblog
Simon Willison's Weblog
M
MIT News - Artificial intelligence
The GitHub Blog
The GitHub Blog
P
Proofpoint News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
J
Java Code Geeks

Hacker News - Newest: "AI"

AI can't read an investor deck AI as an attorney? Student uses ChatGPT, Gemini to sue UW over alleged racial discrimination Hacking MCP Servers in AI Systems – The Rug Pull: Tool Changes After Approval GitHub - MeepCastana/KubeezCut: Free Web based video editor GitHub - GenAI-Gurus/awesome-eu-ai-act: Curated tools, official sources, OSS, templates, and guides for EU AI Act compliance. Can AI judge journalism? A Thiel-backed startup says yes, even if it risks chilling whistleblowers Coming soon: 10 Things That Matter in AI Right Now DARPA built an AI to fact-check enemy weapons claims What explains heterogeneity in AI adoption? When AI Meets Muscle: Context-Aware Electrical Stimulation Promises a New Way to Guide Human Movements - Department of Computer Science AI Changed How We Build. It Did Not Change What Matters. Linux rules on using AI-generated code - Copilot is OK, but humans must take 'full responsibility for the… Meta spins up AI version of Mark Zuckerberg to engage with employees Code Mode: Let Your AI Write Programs, Not Just Call Tools | TanStack Blog GitHub - Delavalom/graft: Go framework for building AI agents. Type-safe tools, multi-provider (OpenAI, Anthropic, Gemini, Bedrock), zero vendor SDKs. India's TCS tops estimates, says new AI models did not dent services demand Gen Z's fading AI hype Strong feeling: we are in a folded AI reality GitHub - machinarii/total-recall-catalog: A reference catalog of latest knowledge retrieval, memory & RAG systems GitHub - mensfeld/code-on-incus: Give each AI agent its own isolated machine with root, Docker, and systemd. Active defense detects and stops threats automatically.. Quantization, LoRA, and the 8% Problem: Benchmarking Local LLMs for Production AI Iran war: We spoke to the man making Lego-style AI videos that experts say are powerful propaganda Powell, Bessent discussed Anthropic's Mythos AI cyber threat with major U.S. banks GitHub - immartian/bellamem: Persistent belief-graph memory for AI agents. Retrieves decisive context by importance — not recency, not RAG, not /compact. recursive-mode: The Repo-Native Operating System for AI Engineering After the attack on Sam Altman's home, will AI CEO's go on the offensive? The biggest advance in AI since the LLM Opus 4.6 vs GPT 5.4 One Prompt Unity World Generation Test “AI polls” are fake polls Client Challenge Can AI be a 'child of God'? Inside Anthropic's meeting with Christian leaders How to Switch AI Chatbots and Why You Might Want To GitHub - MattMessinger1/agentic_refund_guardrail: Safe refund policy layer for AI agents — Python + TypeScript. Same behavior, shared tests. Adam/papers/emergent_values_whitepaper.md at master · strangeadvancedmarketing/Adam Ask HN: How do you stop playing 20 questions with your AI coding tools How far can automation and AI support psychotherapy? - @theU GitHub - stagas/rtdiff: realtime git diff gui and AI-assisted commits A Mac Studio for Local AI — 6 Months Later A History of the Early Years of AI at the University of Edinburgh Why AI Coding Tools Still Feel Stuck on Localhost MSN AI Datacenters Are Becoming Strategic Targets twitter.com Penn Researchers Use AI to Surface Unreported GLP-1 Side Effects in Reddit Posts Show HN: MoodSense AI (ML and FastAPI and Gradio, Deployed on Hugging Face) Moodsense Ai - a Hugging Face Space by aman179102 AI models are terrible at betting on soccer—especially xAI Grok GitHub - xialeistudio/echoic GitHub - HimashaHerath/github-dev-wrapped: AI-powered weekly GitHub activity reports deployed to GitHub Pages GitHub - alejandrobalderas/claude-code-from-source: Architecture, patterns & internals of Anthropic's AI coding agent — reverse-engineered from source maps AI and Tech brief: Ireland ascendant GitHub - Titovilal/context0: Context0 - Never Surrender Training for a Marathon with an AI Coach: What Worked and What Didn't Cyber Pulse: Agentic Intel - Apps on Google Play I Built an AI PR Reviewer That Catches Bugs by Not Looking for Bugs Gen Z workers are so fearful AI will take their job they’re intentionally sabotaging their company’s AI rollout | Fortune How AI Is Reimagining the Game of Golf–For Both Players and Courses GitHub - nattergabriel/reseed: A CLI tool for managing and distributing agent skills across projects Is SVG the final frontier? My AI workflow evolved from prompts to a near-autonomous workflow MLSharp Help - 3DGS Viewer & Generator I put my cognitive field based AI's runtime on GitHub Is Numble the first AI-proof game? A3: Kubernetes for autonomous AI agent fleets | Emergent Principles Deepali Vyas ("The Elite Recruiter") GitHub - msmarkgu/RelayFreeLLM: A restful API designed to route user prompts to various AI model providers. Unionized ProPublica staff are on strike over AI, layoffs, and wages Unleashing the Advantage of Quantum AI We're heading for an AI-fueled 'dementia crisis,' brain scientist warns The AI-Assisted Breach of Mexico's Government Infrastructure [pdf] GitHub - stef41/lmscan: 🔍 Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline. MSN GitHub - visionscaper/collabmem: Enabling long-term collaboration with Agentic AI - building up episodic and world model memory over time with in-context awareness We gave an AI a 3 year retail lease in SF and asked it to make a profit | Andon Labs AI Code is Hollowing Out Open Source, and Maintainers are Looking the Other Way What leaked "SteamGPT" files could mean for the PC gaming platform's use of AI AI is the boss at this retail store. What could go wrong? GitHub - Wuzu11517/agentic-proxy: Local proxy meant to help reduce With Drones, Geophysics and ArtificiaI Intelligence, Researchers Prepare to Do Battle Against Land Mines A Single Operator, Two AI Platforms, Nine Government Agencies: The Full Technical Report 在 Steam 上购买 FriedrichAI: Offline AI 立省 10% GitHub - inevolin/resume-cli: Hit Claude usage limits? Resume any AI coding session elsewhere. Switch tools at zero friction. GitHub - atripati/ark: AI Runtime Kernel — a context operating system for AI agents. Eliminates tool bloat, loads only what’s needed, and gives LLMs their reasoning space back. How to Build a Secure AI PR Reviewer with Claude, GitHub Actions, and JavaScript This Startup Wants You to Pay Up to Talk With AI Versions of Human Experts Intel Arc Pro B70 Brings 32GB VRAM to Local AI for $949 WordPress 7.0: The Good, the AI, and the Still Missing AI on the couch: Anthropic gives Claude 20 hours of psychiatry IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures AI Agents Know About Supabase. They Don't Always Use It Right. The history and future of AI at Google, with Sundar Pichai Inside an AI‑enabled device code phishing campaign How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines AI for Systems: Using LLMs to Optimize Database Query Execution Forecasting the Economic Effects of AI Introducing Tinker: Play with AI, bring your ideas to life AI sheds light on an ancient gaming mystery People really hate AI but not as much as Iran—or Democrats | Fortune What is an AI Product Engineer? Phoebe Gates wants her $185 million AI startup to succeed with 'no ties to my privilege or my last name': 'I have a chip on my shoulder' | Fortune
GitHub - edgestorage/web-cap: Web-Capability: Script-first web capabilities for AI agents. Run in-page scripts, save workflows as reusable capabilities, and generate AI-native userscripts.
huadream5827 · 2026-06-16 · via Hacker News - Newest: "AI"

中文说明

Script-first web capabilities for AI agents. Run in-page scripts, save workflows as reusable capabilities, and generate AI-native userscripts.

Web-Capability is a local-first browser automation toolkit for agents. It lets agents inspect real browser tabs, run reusable in-page scripts, save successful workflows for later command-line use, and turn natural-language browser requests into AI-native userscripts.

Agents interact with Web-Capability through the web-cap CLI. The CLI manages the required local runtime automatically, so users do not need a separate startup command.

Quick Use

  1. Install the Web Cap skill with the skills CLI:

    npx skills add edgestorage/web-cap

    The skill includes the web-cap CLI installation and connection-check workflow for agents.

  2. Install the Web Cap browser extension:

    • Open the Web Cap Releases page.
    • Download the Chrome extension zip asset, named like *chrome*.zip.
    • Unzip the downloaded extension asset.
    • Open chrome://extensions in Chrome.
    • Enable Developer mode.
    • Drag the unzipped extension folder into the extensions page.
    • Open the Web Cap extension details and enable Allow User Scripts.
  3. Check that the CLI can see the browser runtime:

Examples

Reuse a Web Cap Hub script on Hacker News

Run a reusable script from web-cap-hub to summarize the comments on the first five Hacker News posts from the current page with less page exploration, fewer tokens, and faster execution.

Web Cap Hacker News reusable script example

Hide a YouTube section with one sentence

Hide the Top live games block on YouTube Gaming with one sentence, and keep it hidden on future visits.

Web Cap YouTube userscript example

Install CLI Manually

For agent workflows, the Web Cap skill provides the recommended CLI setup path. To install the CLI directly, use npm:

npm install -g web-capability

The installed command is web-cap:

web-cap --help
web-cap session-status

Features

  • Browser extension runtime for real Chromium-based browser tabs.
  • Command-line interface for script execution, registration, tab creation, and user handoff observation.
  • Playwright-style page helpers for common operations such as inspect, wait, click, fill, query, and text reading.
  • Local script registry for reusable browser workflows.
  • AI-native userscript generation for persistent, page-specific browser changes.
  • Browser tab creation and event watching commands for agent workflows.
  • Local-first state storage by default.

Reusable Script Hub

Web Cap can run reusable capability scripts from a local .web-cap/ directory. The shared Web Cap Hub repository collects ready-to-use scripts for common websites and provides examples for writing new site-specific workflows.

To reuse scripts from the hub:

git clone https://github.com/edgestorage/web-cap-hub.git
cd web-cap-hub

web-cap session-status
web-cap script-execute \
  --tab-id <tab-id> \
  --script-file .web-cap/github.com/read-repository-summary.js \
  --input '{"owner":"edgestorage","repo":"web-cap"}'

See the Web Cap Hub README for the current script collection and contribution guidelines.

Why Script-First

Many browser automation tools expose a fixed set of direct actions: click this selector, fill that input, read this text, take a screenshot. Web Cap takes a script-first approach instead.

Agents can run JavaScript inside the page with Playwright-style helpers and register useful scripts as reusable browser skills. This makes Web Cap better suited for workflows where an agent needs to inspect page structure, adapt to product-specific UI, and turn a successful operation into something it can run again later.

Web Cap is not designed to make agents rediscover the same browser workflow every time. Its core value is turning verified browser operations into reusable scripts and reusable workflows.

For recurring pages and tasks, agents can reuse stable workflows instead of repeatedly reading the page, planning each step, finding the right controls, and recovering from mistakes. This can improve accuracy and execution speed while reducing token usage and time spent on repeated browser exploration.

In this sense, Web Cap works well as a browser capability layer for Codex, Claude Code, or other local agent tools: the model can focus on understanding goals and making decisions, while stable browser operations are handled by local reusable automation.

Compared with action-first browser tools, Web Cap focuses on:

  • In-page execution, so scripts can work directly with the DOM and page state.
  • Reusable capabilities, so successful scripts can be saved and run again.
  • Playwright-style page helpers for page inspection and interaction.
  • Optional post-execution observation, so script runs can return evidence about what changed on the page when evidence collection is enabled.
  • Local persistence, so agent-learned workflows can survive beyond a single run.
  • CLI access, so agents can use the same browser capabilities from normal command-line workflows.

Web Cap can observe the page around script execution when evidence collection is enabled. It snapshots visible elements before a script runs, tracks DOM mutations while it runs, then snapshots changed areas afterward and returns a visible-elements diff with added, removed, and updated items. Execution evidence can also include browser-side events such as opened tabs, URL changes, reloads, scroll changes, managed clicks, keyboard input, and script calls.

That means an agent does not only get a script's declared JSON result. It can also inspect what the browser visibly did after the script, which is useful for verification, recovery, and deciding whether a newly successful script should be registered as a reusable capability.

Agent-Oriented Details

  • Page targeting: script definitions include target sites, URL patterns, page hints, tags, type, status, and version, so agents can select the right capability and avoid running a script on the wrong page.
  • Two script types: read scripts inspect or extract page state, while act scripts operate on the page or trigger browser-side changes.
  • User handoff observation: wait-events waits while a user completes a browser action, then streams the resulting interaction path as JSON Lines. Use it when an agent has reached a step that requires user action and needs the observed clicks, input/change/submit activity, URL changes, or loading state to infer what the user did next.
  • Local execution history: inline scripts are tracked locally with status and result metadata. Temporary script ids remain callable while they are in the latest local history entries.
  • Success-gated registration: --register only persists a script when its execution result includes ok: true, which helps keep the reusable script registry clean.
  • Tab-aware execution: commands can target a specific --tab-id, while default execution follows the active connected browser tab.

Roadmap

This roadmap outlines the planned development directions for Web Cap and Web Cap Hub.

Web Cap Hub CLI

Provide quick installation and download support for reusable scripts.

Firefox Extension

Provide Firefox browser extension support.

Client Build and Distribution Improvements

Reduce dependency on the Node.js and npm environment, and explore simpler installation, build, and distribution paths.

Browser-Side AI Chat and Local AI Tool Integration

Provide an in-browser AI chat entry point that connects to local tools such as Codex and Claude Code for actual execution.

Move Script Compilation to the Client

Move heavier TypeScript compilation-related responsibilities from the browser extension to the client to reduce extension size and complexity.

How It Works

Agent
   |
   | CLI command
   v
Web Cap CLI
   |
   v
Managed local runtime
   |
   | WebSocket
   v
Browser extension
   |
   v
Real browser tab

The browser extension connects to the local runtime and executes commands against normal browser tabs. Agents call the CLI, and the CLI handles runtime startup and connection details automatically.

Packages

  • extension/ - browser extension entrypoints and runtime code.
  • lib/ - CLI, local runtime, script registry, and orchestration logic.
  • shared/ - shared protocol, script schema, and validation helpers.
  • skills/ - Agent Skills installable with the skills CLI.
  • tests/ - Vitest coverage for CLI, runtime behavior, browser command contracts, and extension helpers.
  • scripts/ - project utilities and generated-runtime helpers.

Requirements

  • Node.js 20 or newer
  • pnpm 9.x
  • A Chromium-based browser for the current extension runtime

Development Quick Start

Install dependencies:

Start the extension development build:

Load the generated extension from WXT's output directory, then open a normal http or https page.

Run the source CLI during development:

A typical agent flow is:

  1. Use script-execute to run script code against the connected browser.
  2. Add --register to script-execute when a successful inline script should become reusable.

CLI Commands

script-execute

Execute script code in the selected browser tab. Scripts receive one object argument and return one JSON object.

script-execute accepts optional execution settings such as --timeout-ms, --script-file, --input-file, --no-evidence, and --register. During execution, scripts can use the injected Playwright-style page helper. --register saves the inline script only after execution succeeds with ok: true.

Browser commands

Web Cap also includes commands such as browser-new-tab, session-status, and wait-events for agent workflows that need tab control, or need to wait for a user to complete a browser step and inspect the resulting action path.

Script Model

Scripts are JavaScript functions with JSON-compatible inputs and outputs:

export default async function (input) {
  const title = await page.title();
  const text = await page.locator(input.selector).textContent();

  return {
    ok: true,
    title,
    text,
  };
}

The runtime injects a Playwright-style page helper while the script executes. Common APIs include page.locator(...), page.getByRole(...), locator.click(), locator.fill(), locator.textContent(), and locator.waitFor().

For controlled multi-page scripts, cap.goto(url, nextInput) navigates to url and reruns the same script with exactly nextInput as the next input. Page/script state is lost across the navigation, so pass every cross-page field you need, such as step, index, urls, and accumulated results, through nextInput explicitly.

CLI Usage

Run a one-off script:

web-cap script-execute \
  --tab-id 1 \
  --script "export default async function (input) { return { ok: true, input }; }" \
  --input '{"hello":"world"}' \
  --timeout-ms 30000

Use files for larger payloads:

web-cap script-execute \
  --tab-id 1 \
  --script-file ./script.js \
  --input-file ./input.json \
  --no-evidence

Common CLI commands:

web-cap session-status
web-cap script-execute --tab-id 1 --script-file ./script.js --input-file ./input.json --register
web-cap browser-new-tab --url https://example.com --active true
web-cap wait-events --duration-ms 10000

For local source development, replace web-cap with pnpm cli.

JSON-producing commands print compact single-line JSON by default. Use --pretty to print formatted JSON for visual inspection.

Configuration

Persistent CLI configuration is managed with web-cap config and stored in local state. Useful options include:

Key Default Example Effect
evidence common web-cap config set evidence events,visibleElements Controls script execution evidence. Use common, all, or a comma-separated list of events and visibleElements. Pass --no-evidence to script-execute to disable evidence for one run.
mouseTrajectorySimulation false web-cap config set mouseTrajectorySimulation true When enabled, browser-level managed mouse clicks send a multi-step movement path before press/release.
activateTabOnScriptExecute false web-cap config set activateTabOnScriptExecute true Activates the target tab before script execution.

evidence can also be passed per request through options.evidence when using script_execute.

Local State

Web Cap stores local state under ~/.web-cap/ by default. Set WEB_CAP_STATE_DIR to use another directory.

Local state includes registered scripts, recent script execution metadata, and browser session information needed by CLI commands.

Build

Build the browser extension:

Build the npm CLI package:

After building the npm package, the web-cap executable is available at dist/cli.js:

Create extension zip packages:

Quality Checks

pnpm lint
pnpm typecheck
pnpm test
pnpm build

GitHub Actions

The repository includes a build workflow at .github/workflows/build.yml. It runs lint, typecheck, and tests, then uploads browser extension build artifacts and an npm package tarball.

When a version tag matching v* is pushed, the workflow also creates a GitHub Release and uploads the browser extension zip files as release assets.

Known Limitations

  • The extension targets normal http and https pages.
  • The current runtime is primarily validated on Chromium-based browsers; Firefox compatibility is still planned work.
  • Restricted browser pages such as chrome:// are intentionally out of scope.
  • Scripts execute in-page and rely on the injected Playwright-style page helper.
  • Manual validation with a loaded browser extension is recommended before release.

Contributing

Issues and pull requests are welcome. For larger changes, please open an issue first so the implementation direction can be discussed.

Before sending a pull request, run:

pnpm lint
pnpm typecheck
pnpm test
pnpm build

License

Apache License 2.0. See LICENSE.