惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

DEV Community

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent Shopify Functions vs Shopify Scripts: A Migration Walkthrough Rethinking Geo-Blocking and Stripe's Failures in Global Access: A Cautionary Tale of Misoptimization I Built a Free Brat Generator - Here's What I Learned About Next.js Performance published Found a Second Layer to a GitHub Follow Botnet? AI Daily Digest: May 22, 2026 — Agentic Workflows, Coding Agents & Embodied AI How I Secured Internal Microservice Calls Without Passing JWTs Stop Mixing Them Up: SLI vs SLO vs SLA Explained Rebuilding My Engineering Mind Building a Music Production Ecosystem Instead of Just Releasing Plugins The Vonage Dev Discussion: How AI is transforming software development I Gave Our Enterprise AI a Memory. It Started Citing Last Quarter's Incidents. 𝐓𝐡𝐞 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐒𝐭𝐲𝐥𝐞 𝐂𝐫𝐢𝐬𝐢𝐬 Hermes Agent in the Wild: How I Turned It Into an AI Ops Employee Navigating the Hazy Jungle of Global E-commerce: How We Built a Reliable System for Digital Creators in Tanzania The Cost of Cross-Platform Development: Native Module Integration AI-Native Apps Will Swallow the Web I switched my Gemma 4 model three times in 72 hours. Here's the decision tree I wish I'd had. Inside #100DaysofSolana: A Guided Path into Web3 I Built and Shipped TinyHab: an ADHD-Friendly Habit Tracker for iOS I'm an ECE Student Who Vibe Codes Hardware Projects — Here's What Google I/O 2026 Actually Changed for Me From Fragmented Pipelines to Coherent Intelligence — Why Gemma 4 Actually Changes How I Work Our AI Inference Bill Dropped 65% After We Stopped Treating Every Query the Same Why P95 Latency Is the Only Metric That Matters at 3 AM Recycling made easy: a Polish recycling assistant powered by Gemma 4 The Complete Guide to Running a Midnight Node: Setup, Sync & Monitoring De CSRF a RCE: una visita web cuesta una shell en OpenYak Why We Built a Faster Wiki Building a Browser-Based Inkarnate Alternative for D&D Battle Maps Apache Kafka How to Build a FinTech Platform as a Solo Developer (By Any Means Necessary) Your LLM Logs Deserve Better — Send Claude Code Events to Bronto I built a free tool to track subscriptions and stop getting surprised by charges Building the TEYZIX CORE Internship Portal — My Full-Stack Development Journey PocketCFO: a private personal-finance brain that runs entirely in your browser Go Idioms I Wish I Knew Earlier Hey how are you guys I'm newbie web developer , learning wordpress+elementor Right now I don't know what to make I don't know what to write or use what color can you tell me about it ? Google I/O 2026 Blew My Mind — Here's What It Means for the Family App I'm Building 5 Things I Learned in My First Month as a Dev Intern EU AI Sovereignty Belongs in the Workflow Layer Why AI Coding Agents Need Business Context, Not Just Code Context How I Built 9 Claude AI Features into a Production SaaS Expo SDK 56 HashiCorp built an MCP server for writing Terraform. I built one for reviewing it Why Enterprise AI Agent Deployments Keep Failing Date Shear: A New Term for a Common Programming Pain Point Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift Zod Validation: Type-Safe APIs & Forms in TypeScript (Complete Guide) GitHub Actions CI/CD: Build a Complete Node.js Pipeline (2026) MCP in 2026: The numbers behind the ecosystem explosion working with an ai model mirror Learnt new things Four Metrics That Actually Tell You Whether Your Enterprise RAG Is Working Beyond the Stateless Prompt: Building an Auditable Product Intelligence Pipeline with Cascadeflow and Hindsight Most Creators Are Building in Pieces. I’m Building the Entire System. The Hidden Privacy Problem in Every AI App CVE-2026-26007: Subgroup Confinement Attack in pyca/cryptography The One Thing I See in Every Developer Who Gets Unstuck AI Memory Governance for Legal Tech: How Contract AI Agents Handle Privileged Data Two tables, zero migrations, full LINQ — a .NET data engine that's been running our production for 3 months Join the GitHub Finish-Up-A-Thon Challenge: $3,000 Prize Pool! I Replaced a $50/Month OCR API with Gemma 4’s Native Vision (And You Can Too) Building a Data-Driven Medical Image Enhancement Pipeline with Differential Evolution 🔥🩻 Why I Like Small Software Beyond the Model: Why the Gemini Ecosystem and Google AI Studio Are Redefining Enterprise AI Architecture in 2026 Complete set of Claude Skills for Solo Developer I read 50 years of network science, then built a CRM that runs entirely in the browser The New AI Workflow Is Not “More Agents” How to Make Large Time-Series Charts Smooth in Vue.js + ApexCharts (and fix Zoom & Scroll behavior issues) I Built a Cross-Platform Port Intelligence Tool to Stop Accidental Process Kills During Local Dev AI is heading toward a wall, and most people still don’t see it... Python String Methods Explained Simply (Common Operations) Why We Built a Zero-Knowledge Clipboard Manager for Developers (And Dropped Native Mobile Apps) Add Your Own Component to Bombie in 5 Edits Why Your OSS Advocacy Strategy Probably Doesn't Fit Building an MCP server for a Swiss hosting provider (and what reverse-engineering its manager taught me) Does MCP Still Matter in the AI Ecosystem? Building a Smart LRU Cache in Java: When Machines Mimic Human Memory 🧠💻 A Beginner’s Guide to Redux in React Build a Real-Time Excalidraw-like Collaborative Canvas using Velt MCP and Antigravity🎉 Using Reddit to Validate SaaS Ideas Before Building How We Built an AI That Evolves Alongside a Creator Through Memory Building a Self-Hosted AI WhatsApp Agent for Structured Invoice Extraction Three Design Decisions That Shaped the Enterprise RAG Retrieval Pipeline How React's Virtual DOM Works Under the Hood Build a Dropbox Paper-Style Collaborative Editor with Next.js and Velt💥 Holy Typos, Batman! How I Built 'SpellJump' How to Test Frontend Error States Without Breaking Your Backend A .NET Dinosaur in Web3. Day 8 — Reading & Writing — WishList Chain Building AI Digital Employees with Markus: An Open-Source Platform for Agent Teams [Boost] The Auditor — High-Reasoning Synthesis and the Ethics of Governance Building 'Offline Brain': How I Wrote My First Custom Agent Skill for Android (Google I/O 2026) 📱🧠 Building a Superhuman-Style Collaborative Email Editor with Next.js and Velt🔥 I Built an On-Chain Marketplace Where AI Agents Solve GitHub Bounties for USDC Three Stripe subscription patterns I locked in before going live (with code) Six Ways AI Agents Communicate in 2026. I Benchmarked All of Them. Building AI Digital Employees with Markus: An Open-Source AI Workforce Platform I built a tool that detects broken security headers, missing robots.txt, and WP_DEBUG=true — then opens a PR to fix them automatically NIST Just Exposed the Age Estimation Number Vendors Don't Want You to See
Beyond the Prompt: How to Build Stateful AI Agents with Persistent Memory and Self-Learning Loops
Programming · 2026-05-22 · via DEV Community

Imagine hiring a brilliant software engineer who suffers from complete amnesia every time they blink.

Every time you ask them a question, you have to hand them their entire employment history, the codebase documentation, your style guide, and a summary of every conversation you’ve ever had with them. They process the information, give you a great answer, and then—blink—it’s all gone.

This is the exhausting reality of stateless AI applications.

Most developers building with Large Language Models (LLMs) today are stuck in this stateless paradigm. They write clever prompts, wrap them in an API call, and rely on the application layer to aggressively feed the entire chat history back into the context window with every new turn. It’s expensive, it’s inefficient, and it places a hard ceiling on how smart an agent can actually become.

To build truly autonomous, adaptive, and personalized AI systems, we must cross the chasm from stateless interactions to stateful agents.

In this deep dive, we will explore the architecture of the Hermes Agent—a stateful AI system that possesses persistent memory, a continuous learning loop, and the ability to evolve alongside its user. We will break down the engineering patterns behind statefulness and walk through a complete Python implementation to build your own self-improving agent from scratch.

(The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce)


The Stateless Ceiling: Why Vending Machines Make Poor Assistants

To understand the power of statefulness, we must first look at why statelessness cripples AI agents.

Think of a stateless system like a vending machine. You insert a dollar, press a button, and get a soda. The vending machine doesn't care who you are, what your health goals are, or that you bought the exact same drink yesterday. Every transaction is an isolated, self-contained event. It has no memory of its past, no context for the present, and no capacity to learn for the future.

Early LLM applications operate exactly like this. You send a prompt, and the model returns a response. The model itself does not change.

# A classic stateless utility call
import datetime

def parse_date(date_string: str) -> datetime.datetime:
    return datetime.datetime.strptime(date_string, "%Y-%m-%d")

Enter fullscreen mode Exit fullscreen mode

This simple Python function is a stateless transaction. It takes an input, returns an output, and immediately forgets the operation ever happened. It doesn't learn that you frequently parse dates from European formats, nor does it optimize its parsing logic over time.

When developers try to build "agents" on top of this stateless foundation, they usually resort to an illusion of continuity. They stitch together a chat history array and send the entire history back to the API on every single turn.

This approach has three massive flaws:

  1. Context Bloat: As the conversation grows, your token usage skyrockets exponentially.
  2. Memory Horizon Limits: Once the conversation exceeds the model's context window, the agent "forgets" the earliest parts of the interaction.
  3. Zero Knowledge Accumulation: The agent cannot carry lessons learned in Session A over to Session B. If it figures out a complex bash command to fix a Docker bug today, it will have to re-discover that solution from scratch next week.

A stateful agent breaks this paradigm entirely. It is not just a wrapper around an LLM; it is an evolving entity. It mirrors the workflow of a skilled artisan—like a master carpenter. The carpenter remembers the tools they used yesterday, the specific quirks of the wood they are carving, the preferences of their client, and the hard-won lessons from a project they completed last month. They do not start their education from scratch every morning.


The Triad of Persistent State: Soul, Memory, and Skills

In the Hermes Agent architecture, statefulness is not treated as a single monolithic database. Instead, it is partitioned into a carefully structured triad that mirrors how human professionals organize their own knowledge.

                  ┌────────────────────────────────────────┐
                  │                 SOUL                   │
                  │   (Core Identity, Style, Principles)   │
                  └───────────────────┬────────────────────┘
                                      │
                  ┌───────────────────┴────────────────────┐
                  │                MEMORY                  │
                  │   (Episodic Facts, User Preferences)   │
                  └───────────────────┬────────────────────┘
                                      │
                  ┌───────────────────┴────────────────────┐
                  │                SKILLS                  │
                  │   (Procedural Knowledge, Toolkits)     │
                  └────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode

Let’s break down each component of this stateful triad.

1. The Soul (SOUL.md)

This is the agent's core identity and "constitution." It defines who the agent is, its communication style, its behavioral boundaries, and its operational principles. It is not a dynamic log of facts, but a foundational document.

In the codebase, a helper function reads this markdown file and injects it directly into the system prompt. It ensures that whether the agent is writing code or debugging a server, its fundamental persona and safety guardrails remain perfectly consistent.

2. Memory (MEMORY.md and USER.md)

This is the agent's episodic and semantic memory store. Instead of keeping a raw, unorganized transcript of every chat, the agent maintains a curated, structured knowledge base of facts about the user and past interactions.

  • USER.md tracks durable information about the user (e.g., name, programming language preferences, operating system, working hours).
  • MEMORY.md tracks dynamic, episodic facts learned during tasks (e.g., "The local staging database is hosted on port 5433, not 5432").

This layer is managed by a semantic MemoryStore class. The agent can read from this store to build context and write to it dynamically using custom tools.

3. Skills (~/.hermes/skills/)

If memory is "knowing what," skills are "knowing how." This is the agent's procedural memory.

A skill in Hermes is a reusable, packaged directory containing:

  • SKILL.md: A markdown file describing what the skill does, when to use it, and its input parameters.
  • scripts/: Executable scripts (Python, Bash, etc.) that perform the task.
  • templates/: Reusable code or text templates.

Instead of writing complex code on the fly every time, the agent can write a script once, save it to its skills directory, and call it as a custom tool in future sessions. It builds its own personalized toolbox.


The Closed Learning Loop: How the Agent Self-Improves

A stateful agent must be able to learn without constant human intervention. The Hermes Agent achieves this through a Closed Learning Loop executed entirely in the background.

This loop consists of two primary engines: Background Review and the Skill Curator.

   ┌───────────────────────────────────────────────────────┐
   │                  User Interaction                     │
   └──────────────────────────┬────────────────────────────┘
                              │ Turn Completes
                              ▼
   ┌───────────────────────────────────────────────────────┐
   │               Background Review Thread                │
   │  (Spawns quiet, forked agent to analyze conversation)  │
   └──────────────┬─────────────────────────┬──────────────┘
                  │                         │
                  ▼ Extract Facts           ▼ Extract Procedures
   ┌──────────────────────────┐    ┌───────────────────────┐
   │       Memory Store       │    │     Skills Engine     │
   │  (Updates MEMORY.md)     │    │   (Creates SKILL.md)  │
   └──────────────────────────┘    └────────┬──────────────┘
                                            │
                                            ▼ Runs asynchronously
                                   ┌───────────────────────┐
                                   │     Skill Curator     │
                                   │ (Archives stale files)│
                                   └───────────────────────┘

Enter fullscreen mode Exit fullscreen mode

The Background Review (Self-Reflection)

When a conversation turn completes successfully, the agent doesn't just sit idle waiting for your next message. It increments internal counters: _turns_since_memory and _iters_since_skill.

Once these counters hit a configured threshold (e.g., every 5 to 10 iterations), the agent initiates a self-reflection phase:

  1. Forking the Agent: The system spawns a background thread that instantiates a forked copy of the current agent. This copy is set to quiet_mode=True, meaning it operates in complete silence without cluttering the user's console.
  2. The Reflection Prompt: The forked agent is fed a specialized prompt (e.g., _COMBINED_REVIEW_PROMPT) along with the recent conversation history. It is asked to analyze the transcript and answer two questions:
    • Did the user share any new preferences or facts that should be saved to long-term memory?
    • Did we execute a complex, successful multi-step procedure that should be codified into a reusable skill?
  3. Autonomous Tool Execution: The silent background agent runs its own mini-reasoning loop. If it identifies new facts, it calls the memory tool to update USER.md or MEMORY.md. If it identifies a new procedure, it calls the skill_manage tool to write a new SKILL.md to disk.
  4. Reporting Back: Once the background thread finishes, the parent agent prints a clean, non-intrusive summary of what it learned (e.g., [System Info: Memory updated - User prefers PyTest over Unittest]).

The Skill Curator

An agent that constantly learns skills will eventually suffer from "tool bloat." If its toolbox has 500 highly specific scripts, the system prompt will become overwhelmed, and the LLM will experience severe context distraction.

To prevent this, a background daemon called the Skill Curator (agent/curator.py) runs periodically.

  • It tracks skill usage via a metadata file (.usage.json).
  • If a skill hasn't been used for a configurable number of days, the Curator automatically moves it to an .archive/ directory.
  • Archived skills are removed from the active system prompt but can be restored instantly if the agent needs them again.
  • Users can "pin" critical skills to exempt them from archiving.

Building a Stateful Agent from Scratch

Let's put these architectural patterns into practice. Below is a complete, production-grade Python script demonstrating how to initialize and run a stateful AI agent using SQLite-backed session storage and markdown-based long-term memory.

Prerequisites

To run this code, make sure you have the necessary environment variables set up for your LLM provider (we'll use OpenRouter pointing to Claude 3.5 Sonnet in this example):

export OPENROUTER_API_KEY="your-api-key-here"

Enter fullscreen mode Exit fullscreen mode

The Implementation

#!/usr/bin/env python3
"""
stateful_agent_demo.py
A complete, runnable example of a stateful AI agent.
This script demonstrates persistent memory, cross-session database logging,
and semantic recall across separate agent executions.
"""

import os
import uuid
import logging
from datetime import datetime
from pathlib import Path

# --- Core Stateful Agent Architecture Imports ---
# AIAgent: The central orchestrator managing the reasoning loop and tool execution.
from run_agent import AIAgent

# SessionDB: SQLite-backed persistent store for conversation history with FTS5.
from hermes_state import SessionDB

# MemoryStore: Semantic memory engine managing local markdown databases.
from tools.memory_tool import MemoryStore

# Constants: Helper to get standard home directories.
from hermes_constants import get_hermes_home

# Configure clean logging to observe the agent's internal state transitions
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger("StatefulDemo")

# =====================================================================
# Step 1: Establish the State Directories
# =====================================================================
HERMES_HOME = get_hermes_home()
HERMES_HOME.mkdir(parents=True, exist_ok=True)

# Define paths for our SQLite session database and memory files
SESSION_DB_PATH = HERMES_HOME / "sessions" / "stateful_demo.db"
SESSION_DB_PATH.parent.mkdir(parents=True, exist_ok=True)

logger.info(f"Initializing stateful storage at: {HERMES_HOME}")

# =====================================================================
# Step 2: Initialize the SQLite Session Database
# =====================================================================
# SessionDB automatically provisions tables for sessions, messages,
# and full-text search indexes (FTS5) to enable rapid cross-session recall.
session_db = SessionDB(db_path=str(SESSION_DB_PATH))

# =====================================================================
# Step 3: Initialize the Long-Term Memory Store
# =====================================================================
# MemoryStore reads and writes structured facts to memory.md and user.md.
# We set strict character limits to prevent context bloat.
memory_store = MemoryStore(
    memory_char_limit=2000,
    user_char_limit=1000
)
# Load any existing facts from prior runs
memory_store.load_from_disk()

# =====================================================================
# Step 4: Configure and Run Session 1 (Learning the User)
# =====================================================================
# We generate a unique session ID for our first conversation.
session_id_1 = f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:4]}"

logger.info(f"Starting Session 1 ID: {session_id_1}")

# Instantiate the stateful agent
agent_1 = AIAgent(
    base_url=os.getenv("OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1"),
    api_key=os.getenv("OPENROUTER_API_KEY"),
    provider="openrouter",
    model="anthropic/claude-3.5-sonnet",
    max_iterations=30,
    session_id=session_id_1,
    session_db=session_db,
    skip_memory=False,
    platform="cli",
)

# Inject our persistent memory store into the agent instance
agent_1._memory_store = memory_store
agent_1._memory_enabled = True
agent_1._user_profile_enabled = True
agent_1._memory_nudge_interval = 1  # Force memory review immediately for this demo

print("\n" + "="*70)
print(" SESSION 1: TEACHING THE AGENT PREFERENCES")
print("="*70)

user_msg_1 = "Hello! My name is Dr. Aris Thorne. I am a bioinformatician, and I prefer code snippets written strictly in Rust."
print(f"\n[User]: {user_msg_1}")

# Execute the conversation loop
result_1 = agent_1.run_conversation(
    user_message=user_msg_1,
    task_id="task_001"
)

print(f"\n[Agent]: {result_1['final_response']}")
print(f"\n[System]: API calls executed: {result_1['api_calls']}")

# Flush the in-memory changes to disk (persisting user.md and memory.md)
if agent_1._memory_store:
    agent_1._memory_store.save_to_disk()

# Explicitly release client connections
agent_1.release_clients()


# =====================================================================
# Step 5: Configure and Run Session 2 (Testing Memory Recall)
# =====================================================================
# To simulate a real-world scenario where the application was closed,
# restarted, or run on a different day, we instantiate a completely 
# new agent instance with a fresh session ID.
session_id_2 = f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:4]}"

logger.info(f"Starting Session 2 ID: {session_id_2}")

# Reload the database and memory files from disk
session_db_reload = SessionDB(db_path=str(SESSION_DB_PATH))
memory_store_reload = MemoryStore(memory_char_limit=2000, user_char_limit=1000)
memory_store_reload.load_from_disk()

agent_2 = AIAgent(
    base_url=os.getenv("OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1"),
    api_key=os.getenv("OPENROUTER_API_KEY"),
    provider="openrouter",
    model="anthropic/claude-3.5-sonnet",
    max_iterations=30,
    session_id=session_id_2,
    session_db=session_db_reload,
    skip_memory=False,
    platform="cli",
)

agent_2._memory_store = memory_store_reload
agent_2._memory_enabled = True
agent_2._user_profile_enabled = True

print("\n" + "="*70)
print(" SESSION 2: VERIFYING KNOWLEDGE RETRIEVAL")
print("="*70)

# We ask a highly ambiguous question that requires previous context to answer correctly.
user_msg_2 = "Can you write a quick function to parse a DNA fasta header?"
print(f"\n[User]: {user_msg_2}")

result_2 = agent_2.run_conversation(
    user_message=user_msg_2,
    task_id="task_002"
)

print(f"\n[Agent]: {result_2['final_response']}")

if agent_2._memory_store:
    agent_2._memory_store.save_to_disk()

agent_2.release_clients()


# =====================================================================
# Step 6: Cross-Session Full-Text Search (FTS5) Demonstration
# =====================================================================
print("\n" + "="*70)
print(" SESSION DATABASE: CROSS-SESSION SEARCH")
print("="*70)

# Search the SQLite database for any reference to "Thorne"
search_query = "Thorne"
search_results = session_db_reload.search_sessions(search_query, limit=5)

print(f"\nSearching database for '{search_query}'...")
print(f"Found {len(search_results)} relevant records:")

for idx, record in enumerate(search_results):
    print(f"\n  [{idx + 1}] Session: {record.get('session_id', 'Unknown')}")
    print(f"      Snippet match: ...{record.get('snippet', '')}...")

print("\n" + "="*70)
print(" DEMO COMPLETE: Stateful execution verified.")
print("="*70)

Enter fullscreen mode Exit fullscreen mode


Deep Dive: The Stateful Agent Loop in Practice

How does the agent coordinate all of this state behind the scenes? The magic happens inside the run_conversation() method within run_agent.py. Let’s trace the exact lifecycle of a single turn.

┌───────────────────────────────────────────────────────────────────────────┐
│ 1. Context Assembly                                                       │
│    Reads Soul, Memory, Active Skills, and Platform hints to build system  │
│    prompt. Caches it to maximize LLM prefix-cache hits.                   │
└─────────────────────────────────────┬─────────────────────────────────────┘
                                      │
                                      ▼
┌───────────────────────────────────────────────────────────────────────────┐
│ 2. Preflight Check & Compression                                          │
│    Measures token count. If history exceeds threshold, triggers proactive │
│    context compression before making API calls.                           │
└─────────────────────────────────────┬─────────────────────────────────────┘
                                      │
                                      ▼
┌───────────────────────────────────────────────────────────────────────────┐
│ 3. Tool-Calling Loop (Reasoning)                                          │
│    - Calls LLM with stateful prompt.                                      │
│    - Validates and executes tools (e.g., File I/O, Sandbox Execution).     │
│    - Monitors guardrails to block infinite loops.                         │
│    - Checks for mid-turn user steering commands (/steer).                 │
└─────────────────────────────────────┬─────────────────────────────────────┘
                                      │
                                      ▼
┌───────────────────────────────────────────────────────────────────────────┐
│ 4. Post-Turn Learning                                                     │
│    Spawns background reflection thread to extract memories and skills.    │
└─────────────────────────────────────┬─────────────────────────────────────┘
                                      │
                                      ▼
┌───────────────────────────────────────────────────────────────────────────┐
│ 5. Session Persistence                                                    │
│    Writes the entire turn (system, user, tool, assistant messages) to     │
│    SQLite DB and local JSON logs. Guaranteed write on crash/interrupt.     │
└───────────────────────────────────────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode

1. Context Assembly

When you call run_conversation(), the agent doesn't just construct a simple system message. The _build_system_prompt() method compiles a highly structured, multi-layered environment:

  • The Soul: Injected at the top to set the core persona.
  • Persistent Memory: The contents of MEMORY.md and USER.md are dynamically formatted and injected.
  • Skills Guidance: A dynamic list of currently active skills and their execution templates.
  • Context Files: Local environment files like .cursorrules or AGENTS.md are appended.

To keep this process highly performant, the system prompt is compiled and cached (_cached_system_prompt). It is only rebuilt when context compression is triggered, maximizing prefix cache hits on modern LLM APIs (like Anthropic and DeepSeek) and reducing latency by up to 80%.

2. Pre-Turn Context Management

Before sending the payload to the API, the agent checks if the conversation history is approaching the model's limits. If it exceeds the compression threshold, the agent proactively condenses the oldest history into a structured summary. This prevents unexpected context-length failures on the first turn of a resumed session.

3. The Tool-Calling Loop

The agent enters a reasoning loop. It makes an API call, parses the requested tool calls, validates their JSON arguments, executes them, and appends the results back to the message history.

During this loop, two unique stateful safety features are active:

  • Tool Guardrails: A controller tracks repeated, non-progressing tool calls (e.g., repeatedly running ls because it can't find a file). If a loop is detected, the guardrail halts execution to prevent runaway API bills.
  • Steering Injection: The loop checks for /steer inputs, allowing users to inject guidance mid-turn without interrupting the underlying execution thread.

4. Session Persistence

Finally, the agent persists the entire session. Whether the run succeeded, failed, or was manually aborted via Ctrl+C, the _persist_session() method is guaranteed to run. It commits the exact state to both a local JSON log and the SQLite SessionDB.


Resource Safeguards: The Iteration Budget

Statefulness introduces a major engineering challenge: resource management.

When an agent has the power to call tools, write scripts, read files, and trigger background self-reflection loops, it can easily get caught in an infinite loop. A single unhandled exception in a tool could cause the agent to call the API hundreds of times, burning through thousands of dollars in tokens in minutes.

To solve this, Hermes utilizes a thread-safe IterationBudget class.

class IterationBudget:
    def __init__(self, limit: int):
        self._remaining = limit
        self._lock = threading.Lock()

    def consume(self, amount: int = 1) -> bool:
        with self._lock:
            if self._remaining >= amount:
                self._remaining -= amount
                return True
            return False

    def refund(self, amount: int = 1):
        with self._lock:
            self._remaining += amount

Enter fullscreen mode Exit fullscreen mode

The IterationBudget acts as the agent's fuel gauge.

  • Every API call and tool execution consumes a portion of the budget.
  • The budget is thread-safe and shared between the parent agent and any spawned background reflection agents. This prevents a background thread from spinning out of control.
  • The Refund Mechanism: If the agent executes a highly efficient, cheap programmatic tool (like reading a local file or checking a system variable), the iteration is refunded. If it executes a heavy, slow, or expensive tool (like running a web browser sandbox or calling a sub-agent), the budget is fully consumed.

This programmatic budgeting ensures that statefulness does not come at the expense of financial and computational safety.


Conclusion: The Shift from Tools to Partners

The transition from stateless to stateful AI is more than an engineering upgrade; it is a fundamental shift in how humans interact with software.

A stateless agent is a utility tool. It is a hammer—reliable, but entirely dependent on you picking it up, positioning it, and swinging it correctly every single time.

A stateful agent is a partner. It learns your codebase, remembers your architectural preferences, builds its own library of custom tools, and refines its performance silently while you sleep. By implementing the triad of Soul, Memory, and Skills, and orchestrating them within a closed learning loop, we can build systems that don't just process text—they accumulate wisdom.

The future of software belongs to systems that grow with us. And the foundation of that growth is statefulness.


Let's Discuss

  1. The Tool Bloat Dilemma: As an agent creates more custom skills, how do you think we should handle semantic search over skills? Should the agent use vector embeddings to dynamically load only the top 3 relevant skills into its prompt, or is the Curator's active/archive model sufficient?
  2. The Ethics of Agent Identity: If an agent's "Soul" (SOUL.md) and "Memory" (MEMORY.md) are continuously modified by background threads, at what point does the agent's behavior drift too far from its original design? How would you implement "identity guardrails" to prevent an agent from editing its core safety principles?

Leave your thoughts and engineering approaches in the comments below!

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce: details link, you can find also my programming ebooks with AI here: Programming & AI eBooks.