惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

宝玉的分享
宝玉的分享
WordPress大学
WordPress大学
博客园 - 司徒正美
美团技术团队
酷 壳 – CoolShell
酷 壳 – CoolShell
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
小众软件
小众软件
量子位
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
有赞技术团队
有赞技术团队
博客园 - 【当耐特】
博客园 - Franky
Jina AI
Jina AI
人人都是产品经理
人人都是产品经理
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Threat Research - Cisco Blogs
D
Darknet – Hacking Tools, Hacker News & Cyber Security
F
Fox-IT International blog
T
ThreatConnect
A
Arctic Wolf
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
C
CERT Recently Published Vulnerability Notes
P
Palo Alto Networks Blog
李成银的技术随笔
Project Zero
Project Zero
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
F
Full Disclosure
H
Hacker News: Front Page
雷峰网
雷峰网
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
S
SegmentFault 最新的问题
S
Schneier on Security
T
Tor Project blog
博客园_首页
月光博客
月光博客
大猫的无限游戏
大猫的无限游戏
博客园 - 聂微东
S
Securelist
C
Comments on: Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Attack and Defense Labs
Attack and Defense Labs
IT之家
IT之家
博客园 - 叶小钗
J
Java Code Geeks
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events

DEV Community

Building a Privacy-First Resume Editor with Typst WASM and React MonoGame - A Game Engine for Those Who Love Reinventing the Wheel Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks Pressure-testing Ota on Supabase: from setup prose to executable repo readiness VPC CNI en EKS: cómo dejar de pagar nodos que no usás The Future of Text Analysis: Introducing TechnoHelps Semantic Engine I built a Chrome Extension that saves product images + context directly to Google Drive & Sheets 95+ browser-based dev tools that never touch a server Running Qwen 2.5 Coder 14B Locally in Cursor with Ollama From a 10,000-line OpenSearch export script to a log analysis tool Ghost Bugs Cost $40K: A Neural Debugging Postmortem SECPAC: A Lightweight CLI Tool to Password-Protect Your Environment Variables 🚀 PasteCheck v1.7 + v1.8 — Hints that tell you what to fix, and a nudge panel that tells you where to start 8 Real Ways Developers Make Money in 2026 (Ranked by Effort) I built a free AI-powered Git CLI that writes your commit messages for you sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs OpenLiDARViewer: A Browser-Based LiDAR and Point-Cloud Viewer Local-First Browser Tools: What You Should Not Upload Online Why most freelancers undercharge (and the maths behind fixing it) We built a mahjong dangerous-tile predictor calibrated on 4.97M real hands Building a Chord Progression Generator in the Browser — Music Theory in JS, Sound via Web Audio API tutorial #10: 148 Opens, 0 Replies — How My Forge Cold Email v1 Completely Failed 9 in 10 Docker Compose files skip the basic security flags How to Forward Android SMS to Telegram Automatically I built the first security scanner for MCP servers — here's what I found Building an Interplanetary Quantum Logic Engine in Rust/Ovie From AI Code Generation to AI System Investigation I gave Gemini 3.5 Flash a CVE-fix PR to review. It found another bug in the same file. When I Realized We Were Throwing Away Half Our Engine's Potential TokenJuice and the 20-Minute Cron: Inside OpenHuman’s Aggressive Context-Harvesting Engine CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode Building a semantic search API in Go with Meilisearch April 2026 DigitalOcean Tutorials: Inference Optimization and AI Infrastructure Looking for DTMF transceiver module Moving Beyond "Tribal Software": Why the Singularity Demands the Interplanetary Hybrid Human Use SVGIcons as a Claude Custom Connector to Find Icons Faster DMARC Is Now a Proper Internet Standard: What Changed in RFC 9989/9990/9991 OpenTelemetry Is Now a CNCF Graduate — and It's Coming for Your AI Stack OpenHuman Follows OpenClaw’s Rise, But With an Obsidian Brain O erro mais caro em programas Solana: PDA sem bump check Build a Live Flight Radar in a Single HTML File DuckDB 1.5.3 Adds Quack Client-Server, SQLite Gets Cypher Graph Extension Custom Copilot Agents: Building Domain-Expert AI Teammates with Skills, MCP Tools, and Custom Knowledge RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains This week in Cursor + .NET — 3 rules + 4 essays (week ending May 22, 2026) RAG Architecture with n8n + PostgreSQL (pgvector) + Ollama Gemma4 on AWS EC2 Keep Your Taste I Built chanprobe Because My Go Queues Were Invisible Building a Live Solana TPS Meter with OrbitFlare's TypeScript SDK Using Gemma 4 to Analyze Bitcoin’s Next 5, 15, and 60 Minutes Security news weekly round-up - 22nd May 2026 When Stress Disguises Itself as Rational Planning (Bite-size Article) A Domain-Driven Notification Microservice — Patterns From Production I Built KubeCrash: Learn Kubernetes by Diagnosing Real Incidents The Real-World Test: How Gemini’s New Interface Won Over My Wife and Mother-in-Law (Who Are Totally Non-Tech) Running a Full Multi-Stage Intrusion Simulation. Every Detection Fired. Spec sheets aren't capabilities: a Day-1 Gemma 4 eval on Telugu vision Design a Clean Form with Floating Labels in Bootstrap 5 Your MCP Server Is Probably Overprivileged - Here's a Scanner For It I built a free developer tools site that works entirely in your browser Maatru: An agentic Telugu literacy app for kids, built with Gemma 4 GitHub confirms internal repository breach via poisoned VS Code extension Gemma 4 Is Not Just Another Open Model — It Changes What Developers Can Build Locally OpenVibe: An Open-Source AI Coding IDE That Works With Any Model I Inspected the System Program and It Looked Just Like My Wallet Hermes vs OpenClaw: The Two Most-Starred AI Agent Frameworks of 2026 Stop retraining YOLO: a developer’s guide to zero-shot object detection with generative VLMs AI, the New UI, Not the New API Sensors and Guides: Two Ways Your Harness Talks to Your Agent Fixing Google BigQuery Auth Proxying We didn't ship a feature, we shipped an agentic opt-in beta Wake-Up Call: Why AI Safety Guardrails Break Under Pressure 🧩 Handling 1,000+ Inputs with Angular Reactive Forms: An Enterprise Architecture Breakdown
One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd
Davincc77 · 2026-05-23 · via DEV Community

A diagram showing Hermes Agent as the workflow runner and .klickd as the portable state layer. It illustrates how Hermes runs tasks, tools, reports, and artifacts, while .klickd carries project memory, verification gates, human veto rules, claim sources, and benchmark context across models and agent sessions.

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

What I Built

I built a prototype integration between Hermes Agent and .klickd, an open portable memory format for AI agents.

The problem I wanted to explore is simple:

Every new agent session often pays again to rediscover context that already exists.

That repeated context cost shows up as:

  • re-explaining project state;
  • reloading constraints;
  • rediscovering previous decisions;
  • rebuilding handoff notes;
  • rerunning tests just to find the same failure;
  • losing track of which actions require human approval.

.klickd is designed to turn that repeated context into a portable, encrypted, versioned file that an agent can load before work starts.

Hermes Agent is a good fit for testing this because it is an open-source, self-hosted agent runtime with skills, plugins, hooks, approvals, local execution, and agentic workflow orchestration.

In this project:

Hermes runs the workflow. .klickd carries the state.

The prototype focuses on a benchmark called Context Cost Benchmark, which compares two modes:

  1. Baseline cold start

    The full context is pasted into the prompt every time.

  2. .klickd-loaded mode

    Structured context is loaded from a .klickd fixture and injected into the agent workflow.

The benchmark is designed to measure:

  • repeated input tokens;
  • output tokens;
  • estimated cost;
  • latency;
  • continuity errors;
  • violations of locked decisions;
  • violations of tool permissions;
  • handoff quality;
  • unnecessary reruns of expensive commands.

The goal is not to claim a magic percentage improvement. The goal is to measure, reproducibly:

How many tokens and errors are we paying for simply because the agent has to rediscover state we already produced?

Demo

For the Hermes Agent Challenge, I created an experimental Hermes integration inside the klickdskill repository.

The demo uses Hermes Agent to drive the local .klickd Context Cost Benchmark.

If the embedded agent session does not render correctly, here is the relevant Hermes output:

session_id: 20260523_004058_85115c

Existing artifacts from 2026-05-23 were used. No rerun was needed.

Token-proxy totals:
- Cold: 310
- Paste: 6570
- Klickd: 5270

Verified artifacts:
- report.md
- summary.csv
- raw_runs.jsonl
- artifacts/sample_test.log

No publishes, git pushes, or external tool calls were performed.

Enter fullscreen mode Exit fullscreen mode

The live Hermes run used:

  • Hermes Agent v0.14.0
  • OpenRouter free model route
  • capped API key with no paid budget
  • local dry-run benchmark
  • no production deployment
  • no package publishing
  • no external posting

Hermes session:

20260523_004058_85115c

Enter fullscreen mode Exit fullscreen mode

Hermes was asked to use the klickd-context-cost skill, inspect the benchmark outputs, and avoid rerunning work if durable artifacts already existed.

The key result:

Existing artifacts from 2026-05-23 were used. No rerun was needed.

Enter fullscreen mode Exit fullscreen mode

That matters because one of the core ideas in .klickd v4 is that agents should not spend tokens or compute rediscovering output that already exists.

The dry-run produced these local artifacts:

benchmarks/context_cost/results/2026-05-23/
├── report.md
├── summary.csv
├── raw_runs.jsonl
└── artifacts/
    └── sample_test.log

Enter fullscreen mode Exit fullscreen mode

The benchmark output was explicitly marked as a whitespace token proxy, not a provider-token measurement. This is important: these are not OpenAI, Anthropic, or OpenRouter tokenizer counts. They are deterministic local proxy values for early validation.

Current dry-run totals:

Condition Token-proxy total
Cold start 310
Full context pasted 6570
.klickd structured context 5270

The useful result is not “.klickd reduces cost by X%.” That would be premature.

The useful result is:

The benchmark harness can now compare repeated context strategies, produce raw evidence, persist artifacts, and let Hermes inspect those artifacts instead of rerunning the same work.

Verification artifacts

One lesson from real agent workflows is that agents often rerun expensive commands just to recover output they already produced.

The benchmark therefore includes a verification_artifacts[] pattern inspired by this idea:

command 2>&1 | tee .test-output/<scope>.log

Enter fullscreen mode Exit fullscreen mode

Instead of rerunning the test suite to find a failure, the agent can inspect the persisted artifact:

grep -n FAIL .test-output/full.log

Enter fullscreen mode Exit fullscreen mode

In .klickd v4, that becomes structured state:

{
  "command": "npm test",
  "artifact_path": ".test-output/vitest.log",
  "status": "failed",
  "query_hint": "grep -n FAIL .test-output/vitest.log",
  "checked_at": "2026-05-23T00:00:00Z",
  "retention": "latest",
  "scope": "project"
}

Enter fullscreen mode Exit fullscreen mode

This turns agent memory into something more operational:

  • what the agent knows;
  • what the agent must verify;
  • what the agent is not allowed to do without approval;
  • where the evidence lives;
  • what happened last time.

Code

Repository:

https://github.com/Davincc77/klickdskill

Hermes POC integration path:

integrations/hermes/
├── README.md
├── skill/
│   └── SKILL.md
├── plugin/
│   ├── plugin.yaml
│   └── __init__.py
├── scripts/
│   └── run_context_cost_benchmark.py
└── tests/

Enter fullscreen mode Exit fullscreen mode

Context Cost Benchmark path:

benchmarks/context_cost/
├── RFC.md
├── runner.py
├── fixtures/
│   ├── baseline/
│   ├── klickd/
│   ├── prompts/
│   ├── validation/
│   ├── verification_artifacts/
│   └── edge_cases/
├── results/
└── tests/

Enter fullscreen mode Exit fullscreen mode

Current benchmark pieces:

  • RFC-003: Context Cost Benchmark
  • local dry-run runner
  • fixture validation
  • deterministic token proxy
  • CSV / JSONL / Markdown reports
  • edge-case fixtures for:
    • migration/version break;
    • tool-call failure recovery;
    • multi-session handoff.

The Hermes integration currently includes:

  • a Hermes-facing skill;
  • an experimental plugin scaffold;
  • a wrapper script that runs the local benchmark;
  • tests for the wrapper;
  • explicit safety constraints:
    • no provider calls from the wrapper;
    • no paid resources;
    • no publishing;
    • no production deployment;
    • no secrets.

My Tech Stack

  • Python SDK — local .klickd loading / saving

Current development install, until PyPI is updated:

pip install "git+https://github.com/Davincc77/klickdskill.git@main#subdirectory=packages/pypi/klickd"

Enter fullscreen mode Exit fullscreen mode

Current Python import:

from klickd import load_klickd, save_klickd

Enter fullscreen mode Exit fullscreen mode

  • GitHub Actions — test vectors and package integrity checks
  • CSV / JSONL / Markdown — benchmark reports
  • Local verification artifacts — persisted logs for agent inspection
  • OpenRouter free model route — used only to run the Hermes agent session for the demo

How I Used Hermes Agent

Hermes Agent is used as the workflow runner for the benchmark.

The .klickd file is not meant to replace Hermes memory or Hermes skills. Instead, it gives Hermes a portable external state artifact it can load before work starts.

Hermes is responsible for:

  • running the benchmark task;
  • reading fixture context;
  • executing local dry-run commands;
  • inspecting generated artifacts;
  • summarizing benchmark results;
  • respecting approval and verification boundaries.

.klickd is responsible for carrying:

  • project state;
  • locked decisions;
  • tool permissions;
  • handoff notes;
  • verification gates;
  • human veto rules;
  • claim sources;
  • verification artifacts.

This is useful because multi-agent systems need more than agent-to-agent communication.

If A2A defines how agents talk, .klickd explores what portable state they carry between tasks, tools, models, and sessions.

The Hermes integration is therefore not about making a chatbot remember more. It is about testing whether an open-source agent runtime can operate with structured, portable context instead of repeatedly reconstructing the same state.

The goal is to reduce:

  • repeated prompt context;
  • hallucinated continuations;
  • forgotten decisions;
  • unsafe actions;
  • unnecessary reruns;
  • handoff failures.

The larger idea is that agent memory should become infrastructure:

Portable state, explicit constraints, verification artifacts, and human approval boundaries.

In short:

Hermes runs the workflow. .klickd carries the state.

What I Learned

The first useful result was not a performance number. It was a workflow result.

Hermes correctly used the existing benchmark artifacts instead of rerunning the dry-run unnecessarily.

That matters because a lot of agent waste is not only token waste. It is also repeated execution waste.

Agents often:

  • rerun tests to rediscover failures;
  • reread long logs from context;
  • rebuild state from previous messages;
  • regenerate summaries that already exist;
  • ask the model to infer what a file could have told it deterministically.

The benchmark and Hermes POC make that waste visible.

This also clarified the role of .klickd:

.klickd should not only remember preferences. It should help agents know:

  • what state exists;
  • what evidence exists;
  • what claims were executed, inspected, or assumed;
  • what actions require human approval;
  • what artifacts should be read before rerunning work.

That is why .klickd v4 is moving beyond portable memory toward a more operational layer:

portable encrypted context
+ project memory
+ verification gates
+ human veto
+ claim sources
+ verification artifacts
+ migration safety

Enter fullscreen mode Exit fullscreen mode

Sources

Hermes Agent Challenge:

https://dev.to/challenges/hermes-agent-2026-05-15

Hermes Agent repository:

https://github.com/NousResearch/hermes-agent

Hermes Agent documentation:

https://hermes-agent.app/en/docs

.klickd / klickdskill repository:

https://github.com/Davincc77/klickdskill

.klickd official page:

https://klickd.app/klickdskill

Related article on preserving command output for agents:

https://dev.to/tacoda/dont-make-the-agent-re-run-the-test-suite-to-find-the-failure-427

Final Note

This is still early.

The benchmark does not yet claim provider-token savings. The current numbers are a deterministic local proxy. The next step is to run the same structure against real provider usage and compare actual input/output tokens, latency, and continuity failures.

But the architecture is now testable:

  • Hermes can act as the workflow runner.
  • .klickd can act as the portable state layer.
  • The benchmark can produce raw evidence.
  • Verification artifacts can prevent unnecessary reruns.
  • The system can evolve without breaking older .klickd files.

That is the direction I want to keep exploring.

One soul. Any model. Any agent.

A diagram showing Hermes Agent as the workflow runner and .klickd as the portable state layer. It illustrates how Hermes runs tasks, tools, reports, and artifacts, while .klickd carries project memory, verification gates, human veto rules, claim sources, and benchmark context across models and agent sessions.