惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

DEV Community

I built Voice2Sub: a local AI subtitle generator for video and audio The OCR Rabbit Hole I tried monetizing my MCP server with x402 — production needs more than npm install Understanding Tracking Dimensions in Accounting Integrations I Ran My Local, NOT AI, AI Code Auditor on Its Own Source Code Agent Surface Map: Gemma 4 review before you install an MCP Stop Being Nice, Start Being Right": The Day My User Reconfigured My Reward Function Building a Database Performance Testing Tool With AI: The Honest Breakdown Hot To Run LLMs Locally Research blockchain with post-quantum Dilithium and custom zk-STARKs from scratch AI agents do not just need tool access. They need execution control. The CTO’s Blueprint for Governing Multi-Agent AI Systems in the Enterprise I audited our CMS and 86% of our articles were invisible. A Sanity gotcha. Upselling Explained Industry-Specific Tactics for EC Owners 2026 I Keep Hermes Agent's Self-Improvement OFF For the First 14 Days — Here's What Happens When I Don't I Built the Hermes + Claude Code Dual-Stack: Orchestrator Meets Coder — Here's the Full Architecture Stop Using .iterrows(). Here's What Actually Fast Looks Like I Built a SaaS to Stop the Awkward "Hey, Did You Get My Invoice?" Conversation I Renamed a Hot Postgres Table Without Dropping a Request How to Build a Self-Hosted AI Gateway With LiteLLM and Open WebUI What is a Webhook? A Complete Guide for Beginners Headless BI: How a Universal Semantic Layer Replaces Tool-Specific Models Beyond Translation: A Developer's Guide to App Localization (i18n & l10n) Aegis: Designing an Offline Ambient Co-Working Companion for High-Burnout Medical and STEM Grinds Local LLM Code Completion Showdown: Zed AI vs Continue vs Cursor (Honest 2026 Review) The Agentic Payment Protocol Wars Your No-Code AI Agent Has a Memory Problem The Agentic Payment Protocol Wars How to Bypass LinkedIn Commercial Use Limit in 2026 (Without Paying $150/mo) We built a statechart hosting platform where two actors in the same state can migrate to different versions — here's why that matters Playwright vs TWD: A Frontend Developer's Honest Comparison Claude Code's skillListingBudgetFraction: The Undocumented Setting Silently Killing Half Your Skills O GitHub pode mudar sua carreira mais do que você imagina Just redesigned and launched my developer portfolio 🚀 Would genuinely love some honest feedback from the dev community 👨‍💻 Data Virtualization and the Semantic Layer: Query Without Copying Launching opub: donated compute for open-source maintainers Four iteration rounds on a security scanner I run, all of them visible. Here is what the loop actually looks like. Why Good Abstractions Make Debugging Harder Found a Coordinated Inauthentic Network on GitHub: 24 Accounts, Fabricated History, and a Generator That Left Its PID in Three READMEs Cursor Just Released Composer 2.5. Here's What Actually Changed for AI Coding Agents. What Wrong Docs Cost Test Automation Teams Export Your DeepSeek Chats to Word, PDF, Google Docs, Markdown & Notion in One Click When the Docs Lie OpenShift Observability: Built-in vs. Bring-Your-Own If your AI initiative is pending for 6 months, the bottleneck is probably not technology Hermes Agent Under the Hood: The Open-Source Runtime for Autonomous AI Systems Expert Systems -The AI That Existed Before AI Was Cool AI-generated accessibility, an update — frontier models still fail, but skills change the game My HTML Learning Journey 🚀 The Day PayPal Failed and the Rust Rewrite Saved the Product Launch Google Sheets CRM: 4 Ways I've Actually Done It (with Apps Script Code) BrontoScope: AI-Powered Error Investigations The job of an AI engineer inside a 40-person company is not what most CEOs think it is Building a Clinical Speech-Therapy App With a Real SLP: 4 Lessons From PhoenixSteps 7 overlooked .Net features How Stripe Took 48 Hours and 3 API Calls to Break My Freelance Income Stream in Lagos Pretty normal Both Camps in the 'Left Behind' Argument Are Right About Each Other Flutter MCP Toolkit v3 Google Just Shipped Gemini 3.5 Flash. Here's What Developers Actually Need to Know. 🔐 Working with Private Symfony Recipes Rate limiting in web apps: what to protect before picking a library Rate limiting en aplicaciones web: qué proteger antes de elegir una librería What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg What It Really Takes to Become a Senior Software Engineer Microservices Were Never About Technology JS Crime Scene: The Misleading Array Project-as-code for a Directus v9 backend When the API literally burned your database after a typo COOKIES DPRK Hacking Trends 2026: AI‑Powered Supply Chain and Developer Environment Attacks Phone control for AI coding sessions is not a tiny terminal PayPal and Crypto Are Not Equals: How I Built a Gumroad Alternative for Restricted Countries Exploring Tech as a Content Writer I Raised Gemma 4's Token Cap. The Dense Model Stopped Refusing. React Server Components Don't Make Your App Fast by Default Multi-Stage Builds for a Next.js App — Reduce Image Size by 70% I Built a Chrome Extension That Teaches Vocabulary While You Browse Why I Walked Back from Next.js and RSC to a Plain SPA and a Separate Backend NeuralPocket: Private On-Device AI with Gemma 4 — Android & Web Github Speckit: Revolucionando o Desenvolvimento com SDD Cloud Cost Elasticity I Built a Payment System for Bangladesh—Heres Why Stripe Failed Us Polyglot Persistence in Microservices: Choosing the Right Database for Each Service Centralized Authentication for a Multi-Brand Laravel Ecosystem How I made a perfect recording button. Simple yet complex thing. Mumbli – my personal Wispr Flow Getting Paid Should Not Be a Geopolitical Nightmare: My NOWPayments Integration Story Four Layers of Validation in Kubernetes with Claude Code Prompt Flow — a visual side project for flow design, trace, and integration steps (looking for feedback) AI Citation Registry: Temporal Gaps in Government Publishing Cycles ShowDev: I built a 100% local, zero-upload PDF editor using WebAssembly JavaC Written by an AI Pipeline, Verified by Three Models. Is It Slop? Part1 Vulkan: Drawing Triangle 1 Why I Stopped Using useEffect to Sync State — and What I Use Instead Por qué dejé de usar useEffect para sincronizar estado y qué uso ahora Migrating a Long-Running WordPress Site to Payload CMS (And All The Chaos That Came With It) Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans Azure DevOps Structure Explained: Organizations, Projects, and Repos Without the Mess
Built a 100k-Document RAG System by Hand. Hermes Read the Architecture in 47 Seconds.
Daniel Nwane · 2026-05-21 · via DEV Community

This is a submission for the Hermes Agent Challenge

For the past six months, I have been building what Hermes does by hand.

Not as a thought experiment. Actually building it: a hybrid BM25 + vector search engine on Cloudflare Workers, a Gemma 4 MoE reflection layer, 100,000+ documents indexed, an MCP server with Durable Objects, multimodal image ingestion with Llama 4 Scout. I won the OpenClaw Challenge writing about spec-writing agents. I built bookmark-cli — a personal knowledge engine with 45,053 tweets indexed from years of reading.

When the Hermes Agent Challenge dropped, I had one question: does the tool do what I taught myself to do?

This is that experiment.


The Setup

My machine: Windows 11, Python 3.14, Git Bash (MSYS2), no WSL2.

The repo: vectorize-mcp-worker — my production hybrid RAG system. Six months of architectural decisions baked into TypeScript. V4 routing. Six specialized query routes. Cost analytics down to the millisecond. The kind of codebase where you know exactly what a competent agent should say about it.

The goal: Install Hermes, configure it to use Claude, have it summarise the architecture in 5 bullets, document every friction point.


Friction Point 1: The Installer Doesn't Know You Exist

The official install command:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Enter fullscreen mode Exit fullscreen mode

I ran it. The script loaded, detected my OS, and exited immediately:

Windows detected. Please use the PowerShell installer:
  irm https://raw.githubusercontent.com/.../install.ps1 | iex

Enter fullscreen mode Exit fullscreen mode

That's not in the README. The PowerShell installer exists — it's right there in the repo — but nothing in the public docs points Windows users to it. You find out by reading the bash script's source.

The install UX assumes you're on Linux or macOS. Windows is supported, but you're expected to find your own way there.


Friction Point 2: WSL2 Would Have Saved Me This

I don't have WSL2 installed. That's on me. Every agent tooling project I've touched in the last year has quietly assumed it. Hermes is no different.

If you're on Windows and planning to work with CLI agents: install WSL2 first. Don't assume Git Bash is equivalent. It almost never is.


Friction Point 3: PyPI to the Rescue (Unofficially)

Rather than chase the PowerShell installer, I checked PyPI:

pip index versions hermes-agent
# Available versions: 0.13.0

Enter fullscreen mode Exit fullscreen mode

hermes-agent 0.13.0 exists on PyPI. The official flow never mentions this. I installed it directly:

pip install hermes-agent

Enter fullscreen mode Exit fullscreen mode

Four minutes, some progress bars, done.


Friction Point 4: Dependency Hell

The install succeeded but left me with this:

browser-use 0.12.6 requires openai==2.16.0, but you have openai 2.24.0
browser-use 0.12.6 requires requests==2.32.5, but you have requests 2.33.0
browser-use 0.12.6 requires rich==14.3.1, but you have rich 14.3.3

Enter fullscreen mode Exit fullscreen mode

Hermes bumped openai from 2.16.0 to 2.24.0. Another tool I use (browser-use) pins the older version. Pip resolved it in Hermes's favor. Something will break later. I don't know what yet.

This is the tax you pay for a rich ecosystem. Every major agent framework is in a dependency arms race.


Friction Point 5: The Binary Isn't on PATH

After install, hermes doesn't work:

$ hermes doctor
bash: hermes: command not found

Enter fullscreen mode Exit fullscreen mode

The executables land in %APPDATA%\Python\Python314\Scripts, which isn't on PATH by default. Pip warns you — in small print, at the end of a long install log.

Fixed with the full path:

/c/Users/DELL/AppData/Roaming/Python/Python314/Scripts/hermes.exe doctor

Enter fullscreen mode Exit fullscreen mode


hermes doctor — Where It Gets Good

Once you find the binary, the experience shifts. hermes doctor:

◆ Python Environment
  ✓ Python 3.14.4
  ⚠ Not in virtual environment (recommended)

◆ Required Packages
  ✓ OpenAI SDK
  ✓ Rich (terminal UI)
  ✓ python-dotenv

◆ Tool Availability
  ✓ browser
  ✓ terminal
  ✓ file
  ✓ memory
  ⚠ browser-cdp (system dependency not met)
  ⚠ web (missing EXA_API_KEY, TAVILY_API_KEY...)

Enter fullscreen mode Exit fullscreen mode

Every warning includes the fix. ✓ or ⚠, then the exact command. You don't go looking for the next step. After an install that gave you nothing, this lands differently.


Friction Point 6: hermes model Is Interactive-Only

I wanted to configure Claude from a script:

hermes model --provider anthropic --model claude-sonnet-4-5

Enter fullscreen mode Exit fullscreen mode

No such flags. hermes model opens an interactive TUI. There's no way to set the model non-interactively from the CLI.

I wrote the API key directly to ~/.hermes/.env:

ANTHROPIC_API_KEY=sk-ant-...

Enter fullscreen mode Exit fullscreen mode

Then passed the model inline at runtime:

hermes chat -m "anthropic/claude-sonnet-4-5" -q "..."

Enter fullscreen mode Exit fullscreen mode


Friction Point 7: --source Doesn't Set the Working Directory

I tried:

hermes chat --source "C:/path/to/vectorize-mcp-worker/src" -q "Summarise..."

Enter fullscreen mode Exit fullscreen mode

Hermes ignored the path and started from my home directory, listing unrelated projects:

I can see:
- bookmark-cli
- dannwaneri
- siteclinic

Which project should I analyze?

Enter fullscreen mode Exit fullscreen mode

--source doesn't mean "work in this directory." I had to cd into the repo first. Small thing, confusing when you're scripting.


The Actual Session

Running from the repo directory with Claude as the model:

cd vectorize-mcp-worker
hermes chat \
  -m "anthropic/claude-sonnet-4-5" \
  -q "Summarise the architecture of this codebase in exactly 5 bullet points." \
  -Q

Enter fullscreen mode Exit fullscreen mode

Output (verbatim, 47 seconds):

Cloudflare Workers-based RAG Platform — Serverless hybrid RAG system deployed on Cloudflare Workers with Vectorize (384d/1024d vector store), D1 (SQLite for full-text and metadata), Workers AI (embeddings via bge-small/bge-m3/qwen3-0.6b), service bindings to multimodal-pro-worker for image processing, and MCP Durable Objects for stateful agent sessions.

Intelligent V4 Routing Architecture — Query router (RouteSelector) uses Llama 3.2 3B intent classifier to dispatch to 6 specialized routes: ENTITY_LOOKUP (SQL), SEMANTIC_SEARCH (vector), KEYWORD_EXACT (BM25), OCR_DOCUMENT (text extraction), VISUAL_ANALYSIS (Llama 4 Scout vision), and GRAPH_REASONING (relationship traversal), with automatic fallback to vector search on route failure.

Hybrid Search with RRF Fusion — Dual-path retrieval combining vector search (Vectorize query) and BM25 keyword search (D1 full-text index), merged via Reciprocal Rank Fusion (RRF) with k=60, followed by optional cross-encoder reranking (bge-reranker-base), semantic highlighting, and Cloudflare Cache API for cacheable queries.

Chunking and Multi-Tenancy Pipeline — Ingestion engine performs recursive chunking with 15% overlap, parallel batch embedding (up to 100 docs/batch), deduplication via parent_id, and tenant isolation enforced by middleware that injects tenant_id filters into all search/delete operations.

Model Context Protocol Server — Exposes tools via modern MCP Streamable HTTP transport backed by VectorizeMcpAgent Durable Object at /mcp endpoint, with legacy JSON-RPC endpoints, cost analytics tracking (totalMs/embeddingMs/vectorMs/rerankerMs), rate limiting on mutating endpoints, and reflection system for synthetic training data generation.

I read that three times.

It got every major component right. RRF with k=60 — correct. Six routing modes, named accurately. The Durable Objects backing the MCP server. I've had reviewers describe the search as "just Vectorize" — missing the BM25 path entirely. Hermes got all of it in 47 seconds.

It missed the Gemma 4 MoE reflection layer specifically, and didn't distinguish between embedding model options by dimension size the way I would. Details inside details. The architecture read is accurate.


What I Actually Think

Hermes got the architecture right. Not close enough — right. And that's the thing that mattered most to me, because it's the hardest thing to fake. You can summarise README bullet points. You can't summarise a codebase you didn't actually read.

The install is rough if you're on Windows. Not broken — rough. The PyPI path works, the dependency conflicts are manageable, but "check the bash script source to find the PowerShell installer" is not an onboarding flow.

The gap I expected — between six months of building this by hand and a first session with a tool someone else built — was smaller than I thought it would be. That means either Hermes is further along than the install suggests, or I built something closer to commodity than I wanted to admit.

Probably both.

Multi-step sessions are what I actually want to test next — whether Hermes holds architectural context across a conversation the way I hold it across a codebase. That's the harder question. This was one query.


The Honest Take

If you're a Windows developer coming to Hermes cold: budget 30 minutes to fight the install, not 5. Once you're past it, the tool is real.

If you're a builder who's been assembling RAG pipelines by hand: Hermes won't replace what you know. It uses what you know. The architectural understanding you built making those systems is exactly what makes you dangerous with a tool like this.

The difference is now I can ask the codebase questions instead of just reading it.

That's not nothing.


All commands run on Windows 11, Python 3.14, Git Bash. Hermes Agent v0.13.0. Claude Sonnet 4.5 via Anthropic API.

vectorize-mcp-worker: hybrid RAG on Cloudflare Workers — github.com/dannwaneri/vectorize-mcp-worker