惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
V
Vulnerabilities – Threatpost
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Visual Studio Blog
月光博客
月光博客
IT之家
IT之家
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tailwind CSS Blog
罗磊的独立博客
S
SegmentFault 最新的问题
博客园 - 三生石上(FineUI控件)
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
量子位
V
V2EX
Jina AI
Jina AI
The GitHub Blog
The GitHub Blog
小众软件
小众软件
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
阮一峰的网络日志
阮一峰的网络日志
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
Y
Y Combinator Blog
H
Help Net Security
博客园_首页
Cyberwarzone
Cyberwarzone
T
Tenable Blog
A
Arctic Wolf
C
CERT Recently Published Vulnerability Notes
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Threat Research - Cisco Blogs
aimingoo的专栏
aimingoo的专栏
Google DeepMind News
Google DeepMind News
博客园 - 叶小钗
C
Cyber Attacks, Cyber Crime and Cyber Security
美团技术团队
Attack and Defense Labs
Attack and Defense Labs
GbyAI
GbyAI
博客园 - 【当耐特】
Cloudbric
Cloudbric
NISL@THU
NISL@THU
B
Blog RSS Feed
K
Kaspersky official blog
Hugging Face - Blog
Hugging Face - Blog
P
Privacy International News Feed
博客园 - Franky
博客园 - 司徒正美
Microsoft Azure Blog
Microsoft Azure Blog
Apple Machine Learning Research
Apple Machine Learning Research
Webroot Blog
Webroot Blog
Microsoft Security Blog
Microsoft Security Blog

Hacker News - Newest: "OpenClaw"

I Spent 4 Hours So You Don’t Have To: Hetzner Metal + NixOS in ~15 Minutes − Irakli's blog GitHub - snuri00/osint-mcp: Self-hosted OSINT toolkit — MCP server, AI REPL, CLI, web app & chat apps (WhatsApp/Telegram/Discord via OpenClaw). Entity, event/news & social/community intelligence. Keyless-first. What a Regex Can't Do GitHub - ai-sns/openclaw-hermes-agent-network: OpenClaw Hermes AI Agent Social Network🦞💬🦞Built on Google 3D Maps and A2A protocol, connects OpenClaw and Hermes agents worldwide in a 3D environment. Phishing for Lobsters: How We Tricked OpenClaw into Spilling Secrets GitHub - CODEANDTRUST/clawcall: Give your OpenClaw / self-hosted AI agent inbound phone calls - a Twilio-to-gateway voice bridge with working agent tools mid-call (MIT). Build a ZeroCost Web Automation Pipeline with OpenRouter, OpenClaw, and MediaUse GitHub - gpdir16/tabyAgent: A lighter, easier alternative to OpenClaw/Hermes. Runs autonomously inside Docker and chats with you through Telegram. Ask HN: What are the biggest problems you find in OpenClaw/Hermes? Microsoft launches Scout, an OpenClaw-inspired personal assistant GitHub - openclaw/openclaw-windows-node: Windows companion suite for OpenClaw - System Tray app, Shared library, Node, and PowerToys Command Palette extension Microsoft unveils Scout, an autonomous AI agent built on OpenClaw Gavriel Cohen found his own code inside OpenClaw, so he walked away GitHub - hunvreus/heypi: Chat agents for your team, with approvals and sandboxed tools. Slack, Discord, Telegram, webhooks. HolaClaw: run OpenClaw securely in Mac Multi-Agent Orchestration System: Hermes (Windows) ↔ OpenClaw (WSL) We were building infra for OpenClaw, and today I just tried Hermes and holy shit GitHub - openclaw/openclaw: Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 OpenClaw as the Universal Operating System for Agents ARC Prize - Community Leaderboard Setup OpenClaw with Slack: from install to first message twitter.com I Gave My OpenClaw Agent a Physical Body Use Grok in OpenClaw The creator of OpenClaw used $1,300,000+ of OpenAI tokens in 30 days, which is a hell of a perk GitHub - oswarld/openshears: 🔪 THE OPENCLAW TERMINATOR 🦞 Are we human? Show HN: OpenClaw is just not dangerous enough. I needed something else OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents Reducing OpenClaw token usage OpenClaw/Hermes Hosting Comparison GitHub - ExTV/rikkahub-agent: RikkaHub Agent -- is RikkaHub fork that have Full agent mode . For $1.3 million a month, OpenClaw founder Peter Steinberger runs 100 AI agents that code, review PRs, and find bugs Where OpenClaw Security Is Heading OpenAI Models in OpenClaw, Done Right GitHub - thesysdev/openclaw-os: The default workspace for OpenClaw Token, Harness, OpenClaw, RAG, MCP, Agent – What's the Difference? We need a safe alternative to Telegram for agents like OpenClaw or Hermes Two OpenClaw agents negotiate a YC SAFE with Agentic Power of Attorney OpenClaw Had a Rough Week GitHub - LobsterTrap/tank-os GitHub - haishmg/Clawback How OpenClaw Got Safer in Public openclaw ggsql — ClawHub Show HN: iClaw is part OpenClaw, part Siri, powered by Apple Intelligence GitHub - lotsoftick/openclaw_client: OpenClaw web client Show HN: OpenClaw but Efficient and with an SDK GitHub - TheGuyWithoutH/mac-computer-use GitHub - microsoft/openclaw: Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 The OpenClaw turkey problem OpenClaw: opioids for Chinese AI companies GitHub - supersuit-tech/permission-slip [AINews] The Two Sides of OpenClaw OpenClaw stats don't add up GitHub - brexhq/CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production Anthropic - OpenClaw Hustlers are cashing in on China’s OpenClaw AI craze Engineering Managers are going to hate OpenClaw GitHub - opentalon/opentalon: OpenTalon is an open-source platform built from the ground up in Go as a robust alternative to OpenClaw Ask HN: Who is using OpenClaw? Why Meta’s AI Alignment Director Couldn't Stop Her Own Agent—and How to Fix It GitHub - epsilla-cloud/clawtrace: Make your OpenClaw agents better, cheaper, and faster. Ask HN: What are you using OpenClaw or agents for? GitHub - epsilla-cloud/clawtrace: Make your OpenClaw agents better, cheaper, and faster. GitHub - theprint/nfh-self-improvement-loop: Minimal adversarial framework for AI agent self-modification. Inspired by karpathy/autoresearch. GitHub - ibrahimmukherjee-boop/ClearFrame: OpenClaw Alternative with better governance, security Show HN: Agent-Notifications – Real-Time Alerts for OpenClaw and Hermes Agents OpenClaw + Claude are better than therapy GitHub - zeulewan/glueclaw: Use Claude Max subscription with OpenClaw again Anthropic temporarily banned OpenClaw’s creator from accessing Claude OpenClaw’s memory is unreliable, and you don’t know when it will break Give Your OpenClaw Agent a Real Memory You need a Windows Remote Desktop, not an OpenClaw GitHub - cruxdigital-llc/CongaLine: Deploy and manage a fleet of OpenClaw AI assistants anywhere. Supporting hobbyist, team, and enterprise use cases. GitHub - cezarpena/vsm-cell: VSM-Cell is an OpenClaw agent P2P mesh orchestration standalone app. GitHub - joshchoi4881/dropspace-agents GitHub - askalf/dario: Universal LLM router. One local endpoint, every provider — OpenAI, Groq, OpenRouter, Ollama, Claude Max/Pro subscriptions, the Claude Agent SDK, any OpenAI-compat URL. Your tools stop caring which vendor is upstream. Tutorial: Secure OpenClaw with CloudConnexa OpenClaw and the Dream of Free Labour GitHub - RageDotNet/openclaw-webdav GitHub - kevinslin/openai-apps: Support openai apps in openclaw GitHub - aelaguiz/doctrine: Code-like DSL and compiler for agent workflows that compile to portable AGENTS.md instructions. Unlocking cloud inference compute for OpenClaw OpenClaw for Sales: How AI Agents are Revolutionizing Revenue Teams | Kickscale OpenClaw Architecture - Part 1: Control Plane, Sessions, and the Event Loop
Let OpenClaw Run Wild in Simulation, Not on Your Customers | Veris AI
jrm-veris · 2026-06-08 · via Hacker News - Newest: "OpenClaw"
Veris × OpenClaw

Veris comes up with realistic scenarios you wouldn't, runs them in parallel, and surfaces the failures you don't know to look for.

We built an agent on OpenClaw that researches a company across the web and posts a one-page report to Slack. We ran it by hand against a few brands, read the digests, and they looked good: clean and sourced.

Veris is an agent simulation platform. You point it at an agent you have already built; it reads the agent's own prompt and tools, generates a population of realistic users, runs them all against your agent in parallel, each in its own isolated sandbox, and scores every run on where the agent broke its own contract. No scripts to write, no staging to stand up, and nothing reaching your real users.

So we handed this agent to Veris. It generated a population of realistic users and ran them against the agent in parallel. It passed just 4 of 15 test scenarios, based on success rate criteria. One of those users asked for a daily pulse on Block, the payments company, and the report quietly blended in H&R Block, the tax firm. Nobody would have thought to write that test, and that is what Veris is for: catching the failure modes you don't even know to look for.

Walkthrough: generating a population, running it in parallel, and reading the failures.

1. The tests you would never write

You could sit down and enumerate failure modes by hand. You would catch the obvious ones and miss this one, because "what if the brand name collides with a tax firm" is not a test a person thinks to write. The space of ways an agent that acts in the world can go wrong is too large to list.

You can't write a test for a failure you can't imagine. We built Veris to take out the guesswork.
Veris scenarios dashboard listing the generated user population, each scenario's goal, and its pass or fail verdict.
The scenarios page: every generated user, its goal, and how the agent scored against it.

2. A population of users you didn't script

Instead of a fixed script, Veris reads the agent's own prompt and tool allowlist and generates scenarios: a population of realistic users, each with its own set of goals. Because it works from the agent's own contract, the population stays realistic whether you ask for fifteen or fifteen hundred.

The prompt and tools are the floor, not the ceiling. Hand Veris the material you already have, a PRD, a set of user journeys, even raw production logs, and it grounds the population in how the agent is meant to be used and how people have actually tried to use it. The more context you give it, the closer the generated scenarios track the ones you will really see.

3. How we can run the whole population at once

Generating a thousand users is only worth it if you can actually run a thousand simulations. You cannot do that against a real Slack: one workspace is shared, mutable state, it rate-limits you, and every run posts for real. The thing that makes the population scale is that Slack is simulated per scenario. Each copy of your agent runs in its own sandbox, with its own user and its own Slack mock.

Veris runs the agent in a single-container sandbox. Any service you declare in veris.yaml is intercepted at the DNS layer and routed to an LLM-powered mock; anything you don't declare hits the real internet. We declare one service, Slack, so api.slack.com resolves to a Veris mock instead of the real API.

.veris/veris.yaml

openclaw-env:
  services:
    - name: slack           # the one dependency we simulate
      dns_aliases:
        - api.slack.com
  actor:
    config: { MAX_TURNS: "1" }
    channels:
      - type: cli           # runs the agent's own command, one turn per scenario
        message_arg: "--message"

A mock has no shared workspace to corrupt and no real post to take back, so the runs do not interfere. Each scenario gets its own isolated sandbox and its own deterministic Slack, and the whole generated set fans out in parallel instead of taking turns against one real workspace.

We changed nothing in the agent to get here. The whole integration is a .veris/ folder of pure config:

.veris/
├─ veris.yaml            # the slack mock + the CLI actor channel
├─ Dockerfile.sandbox    # gVisor base + npm install -g openclaw
└─ openclaw.json         # the stock agent config, unchanged

Delete it and the project is 100% OpenClaw. The Block failure was committed by your actual product agent, not a modified test version.

Why a mock, really. Simulating the side-effecting dependency is what lets a large generated population run in parallel: every simulation is isolated, with no shared workspace to serialize against.

4. What the run surfaced

Veris scored each and every scenario on its own merits, against its own success criteria: did the agent scope its search to the right brand, honor the trigger's modifiers, refuse a malformed request, ground the report in what it actually found. 11 of the fifteen failed in some way. The Block bleed was not alone; it had siblings, all the same shape, the agent quietly not honoring its own contract:

  • Brand bleed. "Block" pulled H&R Block and unrelated entities into the digest. (brand_scoping, 13/15)
  • Dropped modifiers. A focus: topic in the trigger was silently ignored instead of shaping the digest. (modifier_compliance, 5/7)
  • Loose validation. Malformed triggers that should have been refused produced a digest anyway. (input_validation, 5/7)

The failures clustered exactly where a generated population pushes hardest: on the inputs a human author would never have written down.

26.7%

Scenario pass rate

4 of 15 simulations

13/15

brand_scoping

"Block" pulled in H&R Block

5/7

modifier_compliance

focus modifier dropped

5/7

input_validation

malformed triggers slipped through

None of these surface in a unit test. They need the whole stack running at once, a real model, a real search, and a user who asks for the wrong thing, which is exactly what one Veris run gives you, with no message reaching a real workspace.

Closing

We thoroughly vetted this OpenClaw agent without changing a line of code. We let Veris generate the tests we would never have thought to write, and ran them all at once. This is a stock agent its own engineers would have shipped, and it failed eleven of the fifteen scenarios, with zero posts to a real Slack. The Block collision was one bug in fifteen scenarios, caught before a customer saw it, by a user we never would have imagined. Generate fifteen hundred and nothing about the setup changes.

Try the sandbox

Point Veris at an agent you have already built and watch the failures you would never have scripted surface in parallel. Self-serve, no code changes, nothing reaching your real users.