惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Topical Authority Architecture How Logs Travel From Your EKS Pod to Datadog The Hidden Journey Inside / Kubernetes Is it safe to connect my bank account to AI? Gemma 4: The 128K Multimodal Powerhouse in Your Terminal How to Consolidate Your QA Toolstack: A Practical Buyer's Guide The Thank-You Email Almost Nobody Sends (And Why That's Your Edge) Schema Types 2026 Idempotency Keys: The API Safety Net You're Probably Not Using How to let Claude see my Plaid bank data Kiro Did It: Build a Simple Portfolio Website with Kiro IDE | From Prompt to HTML Prototype Islands of Commerce: What Marketplace Founders Can Learn from 60 Years of Island Biogeography React Pointer Hooks: Hover, Long-Press, Double-Click, Scratch, and Click-Outside Without the Bugs Engineering decisions for my video call tool VBScript Still Lives: How a Custom Go VM Brought Classic ASP to Linux and Mac What Happens When You Teach Old Scripting Languages New Runtime Tricks? I Tested 6 AI Coding Assistants for a Month. Here's What Actually Works. Extendscript Still Has Life Afriex Webhook Integration Guide: Signature Verification, Event Handling, and Production Best Practices The Blind Alleys of Veltrix Configuration How an ESP32 Turned a LEGO WALL-E Into a Real Working Robot The Flawed Promise of Real-Time Event Handling SSH Login Taking Forever? Check Your DNS Settings Found 897 Fake Followers on DEV.to. Here's How I Proved It. Retry logic, Kafka consumer lag, and the hidden failure pattern that Kubernetes won’t catch WebMCP Might Be the Most Important Announcement at Google I/O 2026 Build a Secure API with Rails 8 - Part-3: Auth Controllers I A/B tested 4 LLMs on the same 500 queries. The results surprised me. Google I/O 2026’s Smartest Developer Release Wasn’t a Model, It Was the Runtime - Managed Agents in Gemini API OSS Monthly Recap: What My Daily Commit Challenge Taught Me About Open Source “Culture” GemmaNotes Cognitive Debt: AI Is Building Your Systems. Do You Actually Understand Them? GeekNews Frontend Weekly Deep Dive - 2026-05-25 I Built a Universal Silicon Loader That Runs on Any SOC (No Bootrom Exploit) Docker容器化部署Node.js应用最佳实践 I Put a Neural Network in a Thermometer — Then It Got Out of Hand Building MGZon: Developer Portfolio + AI Bot + Social Network (9 min demo) Bearing Life (L10): What the Catalog Number Really Tells You Longhorn Volume Health: The Gap Between 'Healthy' and Actually Working Stop Prompting. Start Specifying: How Spec-Driven Development Fixes AI Coding TIL a PowerPoint file is just a zip — so I converted .pptx to Word entirely in the browser 로컬 LLM 셋업 가이드 (v18) Cx Dev Log — 2026-04-24 github's agent audit api is the boring feature that matters # From Teaching Code to Building Real-World Applications Vivado 2026.1 and Linux: why this decision matters beyond the headline Vivado 2026.1 y Linux: por qué la decisión importa más allá del titular ORA-00206 오류 원인과 해결 방법 완벽 가이드 Entidades finas e composição: o design que escolhi para a nova plataforma 10 Open Source Tools Every Developer Should Know 🔥 SSH Config File Mastery: Turning `~/.ssh/config` Into a Productivity Tool I tried to create a programming language... in python I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary I Turned npm outdated into a CI Gate — Here's How Don't fall for the Claude Mythos hype Vestige: A Gemma 4 Brain Tracker That Won't Blow Smoke Up Your Ass Gemminate: Transforming Static Textbooks into Interactive Learning Journeys with Gemma 4 Where Did All the Code Playgrounds Go? I built PROOFER - Privacy first Chrome extension that proofreads your texts using Gemma 4 I Automated My Entire Digital Product Business on a $13/Month GCP VM. Here's the Architecture. Beginner's Mind in Engineering and AI How I use AI agents to turn ideas into public demos I Built a Quotation Generator for Kenyan Street Welders Using Gemma 4's Vision The Math Behind Neural Networks — Explained Like Nobody Did for Me 🧨 Understanding TPC with IEEE802.11h What I’m Starting to Look for in Engineers An npm Downloads Comparison Chart in 300 Lines of Vanilla JS — Nice-Tick Math and API-Direct Fetch Vitreus: Local-First Spreadsheet Intelligence with Gemma 4 Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions I got tired of re-explaining my codebase to ChatGPT — so I built a VS Code extension Revisiting My Phone AI After Gemma 4: The Upgrade I Didn't Know I Needed I built a privacy-first PDF merger in 7 hours — here's the stack and the lessons Google I/O 2026 made me ask an uncomfortable question: are we still coding, or are we managing builders? SSR with JavaScript: Escaping Node.js Clunkiness with AxonASP My CKA Exam-Day Experience: What Went Right, What Went Wrong, and Lessons Learned Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀 Two weeks ago, I built a private AI brain on my phone using Gemma 4. Yesterday, Google dropped a new variant that made everything I built feel like a beta test. 256M parameters. MoE architecture. Apache 2.0 license. I broke down what changed and why it mat I got tired of clicking through the Stripe dashboard, so I built a CLI Getting Data from Multiple Sources in Power BI: A Practical Guide to Modern Data Integration Google Is No Longer Just a Search Engine I built GemmaPod - A truly composable and portable AI agent solution powered by your local LLM Gemma 4 E4B caught three planted fabrications in 50 seconds — on a laptop, no cloud How to build an AI-powered content moderation pipeline for user comments Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama AI Makes Building Cheap. Our Product Architectures Still Assume It’s Expensive. I built an in-browser Roku TV remote with ~80 lines of TypeScript. Here's how Roku's ECP API actually works The Direction of Blame babbled notes: a sound-to-music agent for people who could not make music before How I Built a Live SQL Workshop Where Students Can't Break Anything Rescuing a Stranded Protocol: Re-Skinning Legacy Code for the Trestle DeFi Flywheel SOLID Heuristics Reveal Incomplete Domain Knowledge — Nothing More AllasCode Intitute / FullAgenticStack: The Intent-Based Router Introducing LogicGrid — Multi-Agent AI Orchestration for .NET AI Prompt Injection, Drupal SQLi Exploitation, and Nmap for Hardening AI Agents & Python Workflows: Anthropic Skills, Jupyter Challenges, and Edge Deployment SQLite Optimization, PostgreSQL Async Queries, & DuckLake Dataframe Spec RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix Microsoft Burned Its 2026 AI Budget on Claude Code in Six Months. That's the Real Story. Why I Started Learning FastAPI in 2026 I Abandoned Ghost for Months — Then Came Back and Finally Finished It Building an Open MIT-Licensed Ephemeris Engine in C — JPL Moshier Ephemeris
Inside Hermes Agent's Session Memory: What X-Hermes-Session-Id Actually Does
pulkitgovran · 2026-05-25 · via DEV Community

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

The header looks trivial. One line. But it's doing something architecturally significant. Here's exactly what happens when you pass X-Hermes-Session-Id to Hermes — and why it matters more than it appears.


The Naive Mental Model (and Why It's Wrong)

Most developers assume persistent session = stored chat history. Request comes in → look up conversation log → prepend to messages → send to LLM. Like a database-backed chatbot.

That's not what Hermes does.

The naive model has a linear cost problem:

Turn 1:   send 100 tokens
Turn 10:  send 1,000 tokens
Turn 100: send 10,000 tokens
Turn N:   send N × average_turn_length tokens

Enter fullscreen mode Exit fullscreen mode

At 1000 turns you're sending a short novel on every request. This is why "just store the history" breaks for long-running agents.


What's Actually Happening: Compressed State, Not Transcript Replay

Hermes maintains a continuously updated compressed state per session ID — not a raw transcript that grows without bound.

Prior turns are distilled into the model's retained understanding. The context window stays bounded regardless of how many turns have occurred. New inputs are processed against accumulated understanding, not against a raw replay of every prior message.

The practical effect:

# Turn 1 — explicitly stated
chat("My name is Alex. I'm building a distributed cache in Rust.")

# Turn 200 — two months and 199 interactions later
# No history sent. No RAG lookup. Just the session ID.
chat("What tech stack are we using again?")
# "You're building a distributed cache in Rust."

Enter fullscreen mode Exit fullscreen mode

The model doesn't "find" that fact. It retained it.


The Session ID as a Namespace

Each unique X-Hermes-Session-Id value is a completely isolated memory namespace. Sessions never bleed into each other. This makes session IDs a first-class design primitive.

from openai import AsyncOpenAI

client = AsyncOpenAI(base_url="http://localhost:11434/v1", api_key="hermes")

async def chat(message: str, session_id: str) -> str:
    response = await client.chat.completions.create(
        model="hermes",
        messages=[{"role": "user", "content": message}],
        extra_headers={"X-Hermes-Session-Id": session_id},
    )
    return response.choices[0].message.content

# These sessions are completely isolated brains
await chat("Commit: removed Redis cache, caused 3 outages", "repo:acme/backend")
await chat("Commit: added Redis cache layer for performance", "repo:widgets/frontend")

# Each query draws only from its own session
result = await chat("What cache decisions were made?", "repo:acme/backend")
# Knows about the Redis removal — knows nothing about widgets/frontend

Enter fullscreen mode Exit fullscreen mode

Map session IDs to your domain:

Domain Session ID Pattern
Per-user memory user:{user_id}
Per-repository memory repo:{owner}/{name}
Per-customer support support:{customer_id}
Per-project context project:{id}:v{version}

What Gets Retained and How

Every message sent through a session is processed and distilled. Hermes prioritizes retention of:

Explicit facts — names, decisions, stated preferences, numbers

"We use PostgreSQL 15 on RDS with read replicas in us-east-1"
→ retained verbatim

Enter fullscreen mode Exit fullscreen mode

Causal relationships — X was done because of Y

"Removed Redis because cache invalidation bugs caused stale product prices"
→ the causal link is retained, not just the removal

Enter fullscreen mode Exit fullscreen mode

Temporal markers — when things happened relative to each other

"Tried GraphQL in Q1, reverted in Q2 due to N+1 issues"
→ the sequence and the reason are retained together

Enter fullscreen mode Exit fullscreen mode

Contradictions — when new information conflicts with what's stored

Prior: "We're committed to microservices"
New: "Merged all services back into a monolith"
→ Hermes flags this as a reversal when asked about architecture decisions

Enter fullscreen mode Exit fullscreen mode

This is the distinction from retrieval. RAG finds text. Hermes retains understanding of relationships between facts.


The Cron Integration: Memory Meets Autonomy

Hermes's /api/jobs endpoint connects the session memory system to time. A registered job is a prompt that fires on a schedule — and crucially, it runs through the same accumulated session context.

import httpx

# Register a job that runs against its own accumulated memory
httpx.post(
    "http://localhost:11434/api/jobs",
    headers={"Authorization": "Bearer hermes"},
    json={
        "name": "weekly-pattern-report",
        "schedule": "0 9 * * 1",
        "prompt": (
            "You are the Shadow CTO for acme/backend. "
            "Review the engineering decisions you have stored in memory "
            "from the past week. Identify any recurring failure patterns "
            "or decisions that were reversed. Prepare a concise report."
        ),
    },
)

Enter fullscreen mode Exit fullscreen mode

The agent isn't querying an external database. It's asking itself what it remembers. This is the architecture that enables genuinely autonomous behavior — not polling, not retrieval, not RAG. Introspection over accumulated memory.


Streaming: The Architecture Underneath

For user-facing features, always use the streaming endpoint. Hermes reasons before answering — on questions about accumulated history, full responses can take 10–20 seconds. Streaming makes that latency invisible.

# Streaming via SSE in FastAPI
async def generate_sse(session_id: str, question: str):
    stream = await client.chat.completions.create(
        model="hermes",
        messages=[{"role": "user", "content": question}],
        stream=True,
        extra_headers={"X-Hermes-Session-Id": session_id},
    )
    async for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            # Escape newlines for SSE wire format
            yield f"data: {delta.replace(chr(10), chr(92) + 'n')}\n\n"
    yield "data: [DONE]\n\n"

Enter fullscreen mode Exit fullscreen mode

The frontend side is a standard EventSource. The user sees the answer build character by character, which feels fast even when total generation takes 15 seconds.


When This Architecture Wins vs. RAG

Scenario RAG Better Hermes Session Better
Search across 10k static documents
Remember context across 6 months of activity
Precise source citation with page numbers ⚠️
Understanding causality and sequence over time
"What changed and why" questions
Real-time document ingestion at scale ⚠️
Autonomous scheduled analysis
Detecting reversals and contradictions

The OpenAI Compatibility Layer

Because Hermes wraps an OpenAI-compatible API, migration from existing OpenAI code is nearly zero-cost:

# Before — OpenAI, stateless
from openai import AsyncOpenAI
client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = await client.chat.completions.create(
    model="gpt-4o",
    messages=conversation_history,  # you manage this
)

# After — Hermes, persistent
from openai import AsyncOpenAI
client = AsyncOpenAI(
    base_url="http://localhost:11434/v1",
    api_key="hermes",
)

response = await client.chat.completions.create(
    model="hermes",
    messages=[{"role": "user", "content": latest_message}],  # just the new message
    extra_headers={"X-Hermes-Session-Id": user_session_id},  # Hermes handles the rest
)

Enter fullscreen mode Exit fullscreen mode

You drop the conversation history management. You add one header. Tool use, function calling, and streaming patterns all work unchanged.


Summary

X-Hermes-Session-Id isn't a database lookup key. It's a namespace for a persistent reasoning state that accumulates understanding rather than replaying transcripts. The cost is bounded. The knowledge compounds. The autonomy follows naturally from the scheduling integration.

That's the architectural bet Hermes is making: that the future of AI agents is stateful participants that get smarter over time, not stateless query engines that start from zero on every call.

Based on what you can build with a single header, it's a bet worth taking.