5 Things I Wish I Knew Before Building with Hermes Agent

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

I spent a week building a production system on Hermes Agent — a persistent AI memory layer for GitHub repositories called Shadow CTO. Here's what tripped me up and what I'd tell myself on day one.

1. Session IDs Are Your Most Important Design Decision

Make them meaningful from the start. A UUID is fine for experiments. For anything real, encode the domain semantics into the ID itself.

# Bad — opaque, can't debug, can't reason about isolation
session_id = str(uuid.uuid4())

# Good — self-documenting, domain-meaningful
session_id = f"repo:{owner}/{repo_name}"      # per-repository brain
session_id = f"user:{user_id}:support"         # per-user support history
session_id = f"project:{project_id}:planning"  # per-project context

Why this matters more than you think: You cannot merge sessions. If you realize six weeks in that you gave all users the same session ID instead of per-user IDs, you are starting over. There's no migration path. The accumulated memory is gone.

Spend 30 minutes mapping your domain to session IDs before writing a single API call. It's the highest-leverage design decision in a Hermes-backed system.

2. Feed Events, Not Documents

My first instinct was to dump entire files and documents into Hermes and then query them. This works but misses the point entirely.

Hermes's memory is most powerful when you feed it events — things that happened, decisions that were made, changes that occurred over time. Not static content.

# Weaker: dumping a document
await chat(
    "Here is our entire architecture documentation: [5,000 words of context]",
    session_id="repo:acme/backend",
)

# Stronger: feeding events as they happen
await chat(
    "Decision made on 2026-03-14:\n"
    "Switched from PostgreSQL to MongoDB.\n"
    "Reason: Schema flexibility needed for dynamic user preferences.\n"
    "Impact: Migration took 3 sprints, 2 rollback incidents.",
    session_id="repo:acme/backend",
)

The second approach lets Hermes build causal understanding. It knows what came before this decision and can reason about what changed as a result. By the 50th event, it has context that a document dump can never provide: sequence, causality, and pattern.

The rule of thumb: if the information has a timestamp and something happened, feed it as an event. If it's static reference material, RAG is the better tool.

3. Your System Prompt Is a Memory Architecture Decision

Hermes is smart, but it needs to know what to do with incoming information. A generic system prompt produces shallow retention. A structured extraction prompt produces deep, queryable memory.

# Shallow — Hermes stores raw text, answers surface questions
SYSTEM = "You are a helpful assistant."

# Deep — Hermes extracts and retains structured understanding
SYSTEM = """You are the institutional memory for {repo_name}.

When you receive new information:
1. Identify if a meaningful decision was made (skip noise like typo fixes)
2. Extract the rationale — the WHY, not just the what happened
3. Note any contradictions with previous decisions you remember
4. Remember the causal chain: what problem led to this decision

Store understanding, not transcripts. When asked later, cite specifics."""

The quality difference in answer depth between these two system prompts is significant enough that I'd call it the second most important design decision after session ID granularity.

One additional tip: use temperature 0.2–0.3 for ingest (you want consistent, structured extraction) and 0.6–0.7 for Q&A (you want thoughtful synthesis, not mechanical retrieval).

4. Cron Jobs Need Explicit Session Context in the Prompt

This one cost me two hours of debugging. I registered Hermes cron jobs on startup without explicitly anchoring them to a session context in the prompt. The jobs fired on schedule but gave generic, unhelpful responses because they weren't connecting to the right accumulated memory.

# This job fires but has no memory context
await hermes.create_job(
    name="daily-analysis",
    schedule="0 2 * * *",
    prompt="Identify recurring failure patterns from recent engineering decisions.",
    # Where? Whose memory? Hermes doesn't know.
)

# This job fires AND draws from the right accumulated context
await hermes.create_job(
    name="acme-backend-daily-analysis",
    schedule="0 2 * * *",
    prompt=(
        "You are the Shadow CTO for acme/backend. "         # identity
        "Using the engineering decisions stored in your memory "  # explicit memory reference
        "from this repository, identify recurring failure patterns: "
        "components that keep breaking, decisions that were reversed, "
        "or technical debt accumulating. Be specific — cite titles and dates."
    ),
)

The "You are the X for Y" clause in the prompt is what reconnects the cron job to the right session context. Without it, you're firing a prompt into a vacuum. With it, you're triggering an agent that knows who it is and what it's been watching.

5. Streaming Is Worth the Extra Code

The non-streaming endpoint is significantly simpler to implement. For any user-facing feature, use streaming anyway.

Hermes answers thoughtfully. On complex questions about accumulated history — "what are the three biggest risks before the next release?" — responses can take 10–20 seconds of generation time. Without streaming, users see a spinner and wonder if something broke. With streaming, they see the answer building in real time and the latency is invisible.

Backend (FastAPI SSE):

from fastapi.responses import StreamingResponse

@router.post("/query")
async def query(body: QueryRequest):
    async def generate():
        try:
            async for chunk in hermes.stream_chat(
                messages=build_messages(body.question),
                session_id=body.session_id,
            ):
                escaped = chunk.replace("\n", "\\n")
                yield f"data: {escaped}\n\n"
        except Exception as exc:
            yield f"data: [ERROR] {exc}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream",
                             headers={"Cache-Control": "no-cache",
                                      "X-Accel-Buffering": "no"})

Frontend (React):

const es = new EventSource("/api/query", { /* POST via fetch workaround */ });
es.onmessage = (e) => {
    if (e.data === "[DONE]") { es.close(); return; }
    setResponse(prev => prev + e.data.replace(/\\n/g, "\n"));
};

The UX difference between streaming and non-streaming is the gap between a prototype and something that feels production-ready. It's two hours of extra work and worth every minute.

The One-Line Summary

Hermes rewards investment in two things: session ID design and ingest prompt structure. Get those right on day one and the persistent memory largely takes care of itself — you build the domain logic, Hermes carries the institutional knowledge.

Everything else in this list is recoverable. Those two aren't.

推荐订阅源

DEV Community