惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

C
CXSECURITY Database RSS Feed - CXSecurity.com
V
Visual Studio Blog
aimingoo的专栏
aimingoo的专栏
博客园_首页
C
Check Point Blog
T
Threatpost
SecWiki News
SecWiki News
宝玉的分享
宝玉的分享
AWS News Blog
AWS News Blog
博客园 - 三生石上(FineUI控件)
Scott Helme
Scott Helme
The Register - Security
The Register - Security
Cyberwarzone
Cyberwarzone
C
Cyber Attacks, Cyber Crime and Cyber Security
Know Your Adversary
Know Your Adversary
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
P
Proofpoint News Feed
I
InfoQ
WordPress大学
WordPress大学
A
Arctic Wolf
T
Threat Research - Cisco Blogs
大猫的无限游戏
大猫的无限游戏
J
Java Code Geeks
A
About on SuperTechFans
P
Palo Alto Networks Blog
博客园 - Franky
I
Intezer
T
Tenable Blog
S
Secure Thoughts
Project Zero
Project Zero
S
Securelist
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
H
Heimdal Security Blog
Google Online Security Blog
Google Online Security Blog
The Cloudflare Blog
云风的 BLOG
云风的 BLOG
Security Latest
Security Latest
M
MIT News - Artificial intelligence
Martin Fowler
Martin Fowler
H
Hackread – Cybersecurity News, Data Breaches, AI and More
B
Blog
MongoDB | Blog
MongoDB | Blog
Forbes - Security
Forbes - Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
MyScale Blog
MyScale Blog
The Last Watchdog
The Last Watchdog
F
Fortinet All Blogs
雷峰网
雷峰网
V2EX - 技术
V2EX - 技术

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Kanban in Hermes Agent for Self Hosted LLM Workflows
Rost · 2026-05-08 · via DEV Community

Rost

Hermes Agent ships with a Kanban-style board and the Hermes Gateway that can saturate your self-hosted LLM if too many tasks are dispatched at once.

I can say you can easily ddos your own LLM this way.

Hermes Kanban is a durable multi-profile board backed by ~/.hermes/kanban.db.

Each lane represents a phase of work, and each card is a task that can be claimed by a specific Hermes profile.

Out of the box, the dispatcher can promote many ready tasks in one pass. That is fine for elastic cloud APIs, but it can overload a small self-hosted GPU cluster.

If you are new to this stack, start with the broader Hermes setup and operations guide and the AI Systems pillar for surrounding architecture.

This post shows how to:

  • Understand how Hermes Kanban dispatch interacts with your LLM gateway.
  • Control parallelism safely for heavy tasks.
  • Batch promotions with cron so background jobs do not collide with interactive use.
  • Monitor and tune the system so GPUs stay busy without overload.

How Hermes Kanban and the dispatcher work

At a high level, the system has three layers:

  1. Board - durable SQLite state for tasks, columns, relations, and history.
  2. Workers - Hermes profiles started in isolated workspaces to process a task.
  3. Dispatcher - a long-lived process that scans for dispatchable cards and starts runs.

Tasks created from CLI or dashboard usually start in backlog or ready.

The dispatcher scans for eligible cards, claims one atomically, and starts the assigned profile with its tools and memory.

Each worker then calls your LLM gateway or local runtime (for example, OpenAI-compatible endpoints backed by Ollama, vLLM, or llama.cpp). For deployment choices across these runtimes, use the LLM Hosting in 2026 Local Self-Hosted and Cloud Infrastructure Compared. If you are tuning request fan-out on Ollama itself, this pairs well with How Ollama Handles Parallel Requests.

If you add many heavy tasks and do not cap promotions, your gateway can get flooded with concurrent requests.

On a single-GPU or CPU-bound host, that often means queueing, thrashing, and timeouts instead of better throughput.

The practical limitation today

In current Hermes builds many teams run, dispatcher config exposes only two Kanban dispatch keys and does not apply a global active-task cap from config:

kanban:
  dispatch_in_gateway: false
  dispatch_interval_seconds: 10

Enter fullscreen mode Exit fullscreen mode

For active-task control, rely on explicit dispatch cadence (hermes kanban dispatch --max ...) plus dependency modeling.

Known gotchas:

  • Do not run gateway-embedded dispatch and hermes kanban daemon --force against the same board, or you can get claim races.
  • If the gateway is down, ready tasks do not dispatch and can burst later when service returns.
  • Longer dispatch intervals feel uneven because claiming happens in ticks.
  • Behavior can vary across versions because run-state and reclaim edge cases were patched over time.

Quick verification when behavior looks wrong:

# 1) confirm exactly one dispatcher path is active
pgrep -af "hermes gateway start|hermes kanban daemon"

# 2) check the wired Kanban dispatcher keys
rg "dispatch_in_gateway|dispatch_interval_seconds" ~/.hermes/config.yaml

# 3) inspect queue shape
hermes kanban list --status ready
hermes kanban list --status running

Enter fullscreen mode Exit fullscreen mode

Key ideas:

  • Dispatcher config wires dispatch_in_gateway and dispatch_interval_seconds.
  • dispatch --max limits new spawns in that pass, not total running tasks.
  • For small self-hosted clusters, start conservative and increase only after latency stays stable.

When first deploying Hermes near your LLM gateway:

  • Keep only supported Kanban dispatcher keys in config.
  • Observe GPU and CPU utilization under real queue pressure.
  • Use Strategy 1 or Strategy 2 for deterministic pacing.

Investigation findings and root cause

hermes kanban dispatch does not read config.yaml for max_active_tasks.

In hermes_cli/kanban.py, the dispatch command exposes --max as a CLI cap (default None) and passes only args.max into kb.dispatch_once(...). There is no max_active_tasks config lookup in this path. See hermes_cli/kanban.py raw.

Then in kanban_db.dispatch_once, the only cap is max_spawn, with logic equivalent to:

if max_spawn is not None and spawned >= max_spawn:
    break

Enter fullscreen mode Exit fullscreen mode

There is no check of already running tasks and no max_active_tasks reference in that dispatch path. See hermes_cli/kanban_db.py raw.

Effective behavior:

hermes kanban dispatch

Enter fullscreen mode Exit fullscreen mode

unbounded for that pass (limited by ready queue size).

hermes kanban dispatch --max 2

Enter fullscreen mode Exit fullscreen mode

caps only new spawns in that pass, not total running tasks.

The wired config knobs around gateway dispatch are kanban.dispatch_in_gateway and kanban.dispatch_interval_seconds.

So max_active_tasks is ignored in this dispatch path because it is not implemented there.

Strategy 1 - Encode dependencies for strictly sequential flows

Some workflows should run strictly one after another — for example:

  • multi step data pipelines with shared intermediate artefacts
  • migrations or infrastructure changes
  • batch jobs that write to the same object store or database

Hermes Kanban supports parent child dependencies between tasks so that a child card becomes dispatchable only when its parent is done.

You can model this with a small helper script around the Hermes CLI:

#!/usr/bin/env bash

set -euo pipefail

parent_id="$(hermes kanban add \
  --title 'Ingest customer logs for April' \
  --profile 'etl-worker' \
  --column backlog)"

hermes kanban add \
  --title 'Generate April anomaly report' \
  --profile 'analytics-worker' \
  --column backlog \
  --parent "${parent_id}"

hermes kanban add \
  --title 'Publish April summary to dashboard' \
  --profile 'reporting-worker' \
  --column backlog \
  --parent "${parent_id}"

Enter fullscreen mode Exit fullscreen mode

With an appropriate board policy and low dispatcher limits only the parent task runs first.

Once it finishes the child tasks gradually become ready, and the dispatcher pulls them one by one without ever exceeding your concurrency caps.

Strategy 2 - Use Linux cron with a running-aware dispatch cap

If you want deterministic pacing, use host cron plus a small wrapper script.

Instead of always calling dispatch --max 2, first count currently running tasks, then dispatch only the remaining slots.

Create hermes-kanban-dispatch-capped.sh:

#!/usr/bin/env bash
set -euo pipefail

MAX_PARALLEL="${MAX_PARALLEL:-2}"
BOARD="${BOARD:-}"

board_args=()
if [[ -n "$BOARD" ]]; then
  board_args=(--board "$BOARD")
fi

# or where your hermes is installed
export PATH="/home/abc/.local/bin:$PATH"

running_out="$(hermes kanban "${board_args[@]}" list --status running)"

if [[ "$running_out" == *"(no matching tasks)"* ]]; then
  running_count=0
else
  running_count="$(printf '%s\n' "$running_out" | wc -l)"
fi

slots=$(( MAX_PARALLEL - running_count ))

if (( slots <= 0 )); then
  echo "Already at limit running=$running_count max=$MAX_PARALLEL dispatch skipped"
  exit 0
fi

echo "running=$running_count max=$MAX_PARALLEL slots=$slots dispatching up to $slots"

hermes kanban "${board_args[@]}" dispatch --max "$slots"

Enter fullscreen mode Exit fullscreen mode

Make it executable:

chmod +x ./hermes-kanban-dispatch-capped.sh

Enter fullscreen mode Exit fullscreen mode

Run it with:

MAX_PARALLEL=2 ./hermes-kanban-dispatch-capped.sh

Enter fullscreen mode Exit fullscreen mode

For a specific board:

BOARD=my-board MAX_PARALLEL=2 ./hermes-kanban-dispatch-capped.sh

Enter fullscreen mode Exit fullscreen mode

Schedule it once per minute with cron:

* * * * * /opt/hermes/scripts/hermes-kanban-dispatch-capped.sh >> /var/log/hermes/kanban-cron.log 2>&1

Enter fullscreen mode Exit fullscreen mode

Operational notes:

  • Cron often has a minimal PATH, so if hermes is not found, use its full path inside the script (for example /usr/local/bin/hermes).
  • If you log to /var/log/hermes/..., create that directory first and ensure the cron user has write access.

Example:

sudo mkdir -p /var/log/hermes
sudo chown "$USER":"$USER" /var/log/hermes

Enter fullscreen mode Exit fullscreen mode

Create or edit cron entries with:

crontab -e

Enter fullscreen mode Exit fullscreen mode

Then verify with:

crontab -l

Enter fullscreen mode Exit fullscreen mode

Sub-minute cadence with one cron entry

Cron ticks once per minute, but you can still dispatch more frequently by running a short loop inside the script.

Example hermes-kanban-dispatch-subminute.sh:

#!/usr/bin/env bash
set -euo pipefail

LOCK_FILE="/tmp/hermes-kanban-dispatch.lock"
RUNS_PER_MINUTE="${RUNS_PER_MINUTE:-4}"    # 4 runs => every 15 seconds
CAP_SCRIPT="${CAP_SCRIPT:-/opt/hermes/scripts/hermes-kanban-dispatch-capped.sh}"

exec 9>"$LOCK_FILE"
flock -n 9 || exit 0

sleep_seconds=$(( 60 / RUNS_PER_MINUTE ))

for ((i=1; i<=RUNS_PER_MINUTE; i++)); do
  "$CAP_SCRIPT"

  if (( i < RUNS_PER_MINUTE )); then
    sleep "$sleep_seconds"
  fi
done

Enter fullscreen mode Exit fullscreen mode

Make it executable:

chmod +x ./hermes-kanban-dispatch-subminute.sh

Enter fullscreen mode Exit fullscreen mode

Schedule it once per minute:

* * * * * /opt/hermes/scripts/hermes-kanban-dispatch-subminute.sh >> /var/log/hermes/kanban-subminute.log 2>&1

Enter fullscreen mode Exit fullscreen mode

This gives an effective sub-minute cadence while flock prevents overlapping runs.

Why this works:

  • list --status running gives current running load.
  • dispatch --max N caps only new spawns for that pass.
  • Computing N as remaining slots keeps total running tasks near your target limit.

Important caveat: this cap works only for dispatches made through this script.

Disable gateway embedded dispatch, otherwise it can still promote tasks independently:

kanban:
  dispatch_in_gateway: false

Enter fullscreen mode Exit fullscreen mode

The official docs describe both command capabilities and note gateway dispatch defaults in the Kanban feature guide: Hermes Kanban docs.

Internal Hermes Cron

Do not use it.
Do you really want your llm to process regular prompts like Execute in terminal the command /path/hermes-kanban-dispatch-capped.sh, especially when it's busy doing some useful work?

Hermes Kanban Monitoring and Tuning

Whichever strategy you choose you should monitor:

  • LLM gateway metrics — request rate, latency, error rate, token throughput.
  • Node health — GPU utilisation, VRAM usage, CPU load and RAM.
  • Hermes metrics — how many tasks are in backlog, ready, active and done.

For production metric baselines and dashboards, see Monitor LLM Inference in Production with Prometheus and Grafana and the broader LLM Performance hub.

Start with low concurrency, then gradually raise limits while watching for:

  • rising latency at constant throughput
  • increasing timeout or rate limit errors
  • long tails where some tasks stay active for a very long time

As soon as you see these symptoms roll back to the previous stable configuration and keep that as your default.

When Kanban is the right tool

Hermes Kanban shines when you have:

  • long lived research or engineering backlogs
  • multi agent collaboration with named profiles
  • workflows that must survive restarts and host reboots
  • humans who want a dashboard to triage work

If you only need a single run to create a few temporary helpers, the built in delegate task tools are usually simpler.

Once you need history, dashboards and strict control over how your agents hit self hosted LLMs the Kanban board plus dispatcher is the right foundation.

With a few configuration tweaks and optional cron based batching you can keep Hermes Kanban responsive while protecting your gateway and hardware.