惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

I built an AI PR-triage agent in 30 lines of Markdown Core Web Vitals from 74 to 91: A Real Tax Practitioner Site Rebuild I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened. The Hidden Tax of AI-Assisted Development (And How I Fixed It) I Ditched Cloud LLMs for Gemma 4 4B: A DevOps Engineer's 48-Hour Reality Check Building a Schema.org @graph That Validates on the First Try The "Lift and Shift" Trap: Why Your Integration Layer Needs More Than Just a Cloud Address All 7 OSI Layers Explained with Real-World Analogies Antigravity 2.0 in one day: the four shells and what each is good for Self-Hosting Google Fonts with size-adjust: Zero CLS Web Font Swap The Multi-Provider LLM Problem: Why “One API” Is Not Enough How I indexed 69,000 Claude Code skills (and what I learned doing it) RememberMe CareGrid: Local Gemma 4 for dementia memory and safety Google Is Killing Gemini CLI on June 18. Here Is What to Do Before Then Do Domínio ao Deploy: Hospedando Arquivos de Deep Links no Cloudflare Pages (Parte 7.1) Running Gemma 4 26B on an Old GTX 1080 with llama.cpp Devlog 1: I tried building an SNES game with the super FX chip Why Gemma 4 Feels Like an Important Moment for AI Developers✨ From Zero and Confused, This Is How I Started Learning to Code I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founder’s Best Friend MyErp Architecture Series - #02 Cellular Architecture: Mapping Biology to Software Systems NodeJS vs Bun vs Go 🌍 RTL Arabic Style UI How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4 📝 Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders 터미널 AI 에이전트 구축 (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma Why I Built a Privacy-First Dev Toolkit GAS Input Tags: Ability Activation Without Hardcoded Bindings AI Legal Document Advisor Supported By Gemm 4 Model Building Convertify in Public Week 10: PDF Cluster + Blog Launch CureNet AI: Decentralized Health Intelligence for India, Powered by Gemma 4 and ABHA Standardization When Open-Weights AI Meets a Broken Healthcare System: Deploying Gemma 4 in Rural India V.A.L.I.D. Google I/O 2026: The Year Google Stopped Building AI Assistants and Started Shipping AI Engineers Bondmap: AI-Powered Relationship Network That Maps How You're Connected to Everyone Using Gemma 4 Gemma 4 challenge inspired me to build my first app! 96. LoRA: Fine-Tune a Billion-Parameter Model on a Laptop From a Student Who Used CircuitVerse to a GSoC Contributor — My Community Bonding Story How Bf-Tree Keeps Mini-Pages Small, Hot, and Cheap to Evict I asked Claude to explain the chip war and ended up understanding modern geopolitics differently Stop Manually Checking for Server Updates: Automate With Email Notifications Nostalgia Meets Cybersecurity: Spotting Modern Scams in a Retro OS Simulator - Forward or Fraud CRACKING CODING INTERVIEW From Python to Production Pipeline :A Practical guide to Apache Airflow Antigravity 2.0: Google Just Changed What It Means to Be an Engineer I Built a Free Sticker Maker Because Every Other One Hid the Export How I bypassed Blazor WebAssembly's Virtual DOM using raw WASM pointers Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable The Zero-Budget Memory Setup Behind My AI Agent Workflow No database. No framework. Just files, startup order, correction logs, and discipline. I Built an AI Second Brain with Gemma 4 The Most Exciting Google I/O 2026 Announcement for Me: HTML-in-Canvas CrisisLens: Compressing Disaster Scenes into 200-Byte Emergency Payloads with Gemma 4 I'm 15 and I built a todo app with Telegram Stars payments — only legal way for me to monetize before turning 18 Crypto Branding After the Token Launch Building an on-chain alerts bot in Python without any blockchain library FinePrint — An AI Pocket Lawyer That Decodes Predatory Contracts Using Gemma 4 How to Connect OpenAI with Supabase in 10 Minutes for a Lightning-Fast AI MVP One AI Gateway for AWS Bedrock, Google Vertex AI, Gemini, and Anthropic Reading Log #9 — Aoashi The Tacit Dimension Thinking, Fast and Slow Web3 Onboarding Is Not a Wallet Problem. It Is a Trust Problem. FHE Prompt Privacy: The Metadata Leak Your Demo Still Has Software Might Be Becoming Agent-Aware: What if software starts coordinating itself? The Silent Killers of Go Concurrency: Mutexes, Semaphores, and Goroutine Leaks Lynx framework first look Building Aries AI: A Solo-Built AI Abacus Tutor on OpenAI + Supabase + Render + Razorpay I built a paid Telegram bot. Here's what Telegram Stars actually pay. Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions Improving AI resume matching with prompt iteration — 7.37 to 8.37/10 7 things you can do with Rogue Studio that no other AI IDE will let you do Why I Think WordPress Still Matters Reading Log #7 — Aoashi Guns, Germs, and Steel Distinction Open Models and the Sub-Saharan Region What 12 Months of AI-Generated Pull Requests Taught My Engineering Team
Beyond the Loop: Why Monolithic AI Agents Fail and How to Build a Microkernel Architecture
Programming · 2026-05-25 · via DEV Community

If you have built an AI agent recently, chances are your codebase started with a simple, elegant loop. You sent a prompt to an LLM, parsed its tool calls, executed those tools, appended the results to a list of messages, and looped back. It felt magical.

But then reality set in.

You wanted to add a vector database for long-term memory. Then you added a context compression engine to keep API costs down. Next came a dynamic skills system, a background review step, and custom toolkits for specific user tasks.

Suddenly, your elegant loop became a terrifying, deeply nested state machine. A bug in your memory retrieval logic started crashing the entire agent. Your agent initialization function grew to hundreds of lines of fragile setup code. A single change in how you parsed tool arguments broke unrelated downstream features.

You didn't build an intelligent system; you built a monolithic house of cards.

This is the exact breaking point where software systems have faltered for decades. Fortunately, computer science already solved this problem fifty years ago. The answer lies in the transition from monolithic operating systems to microkernel architectures.

In this deep dive, we will explore how Hermes Agent v0.13 shifts from a monolithic agent loop to a modular, microkernel-inspired architecture. We will examine the design patterns, interface contracts, and concrete Python implementations that allow you to build an AI agent that is robust, testable, and infinitely extensible.

(The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce)


The Microkernel Analogy: From Monolithic Agents to Modular Architectures

Early operating systems were monolithic. Every device driver, file system, and network stack lived in the same address space, sharing the same memory and the same failure domain. If a printer driver had a bug, it could overwrite kernel memory and crash the entire machine. If a security vulnerability existed in a file system parser, the entire operating system was compromised.

The solution was the microkernel.

A microkernel strips the core operating system down to its absolute essentials: inter-process communication (IPC), basic memory management, and scheduling. Everything else—device drivers, file systems, network stacks—is moved out of the kernel into isolated, user-space processes. These processes communicate with the core and each other through narrow, well-defined interfaces.

Monolithic Agent Architecture:
+-------------------------------------------------------------+
|                         Agent Core                          |
|  (Loop + Memory + Tools + Context Compression + Skills)     |
+-------------------------------------------------------------+
  * High cyclomatic complexity
  * Single failure domain: One crash kills the agent
  * Multiplicative state space

Microkernel Agent Architecture:
          +-----------------------------------------+
          |               Agent Core                |
          |  (Loop, Tool Dispatch, State, Budget)   |
          +--------------------+--------------------+
                               | (Interface Contract)
          +--------------------+--------------------+
          |                                         |
+---------+---------+                     +---------+---------+
|  Memory Plugin    |                     | Context Plugin    |
|  (Vector DB, etc) |                     | (Compression, etc)|
+-------------------+                     +-------------------+
  * Isolated lifecycles & failure domains
  * Additive complexity

Enter fullscreen mode Exit fullscreen mode

An AI agent's core loop is its kernel. The v0.13 architecture of Hermes Agent treats the agentic core as a minimal substrate. The core handles only the conversation loop, tool dispatch, message persistence, and iteration budgets. Every other capability—from long-term memory to context compression—is treated as an external, isolated plugin.

This is not just a code organization strategy. It is a fundamental shift in how we manage system complexity.

In a monolithic agent, every new feature adds branches to a single state machine. The system's complexity grows multiplicatively. In a modular agent, the system is a federation of independent state machines. The core remains small and testable, while each plugin manages its own state space. The complexity of the system grows additively.


The Plugin Lifecycle: Birth, Initialization, Operation, and Shutdown

In a poorly designed agent, components initialize haphazardly during agent construction. The constructor becomes a dumping ground for database clients, API keys, and file paths. If a database connection fails on startup, the entire agent fails to instantiate.

To solve this, Hermes Agent enforces a formal, four-phase plugin lifecycle:

  1. Registration: The plugin is registered with the agent's manager. The plugin declares its capabilities, tool schemas, and dependencies. The core validates that no naming conflicts exist.
  2. Initialization: The plugin receives its configuration and establishes its external resources (e.g., database connections, network clients). If a plugin fails to initialize, the core catches the error, marks the plugin as unavailable, and routes around it.
  3. Operation: The plugin participates in the conversation loop, provides system prompt blocks, and handles routed tool calls.
  4. Shutdown: The plugin gracefully releases its resources (draining network queues, closing file handles, flushing caches). This phase is guaranteed to run even if the agent crashes.

Let’s look at how this is enforced in the Hermes Agent codebase. Below is the registration logic from agent/memory_manager.py:

# agent/memory_manager.py — Plugin registration via interface contract
class MemoryManager:
    """Orchestrates the built-in provider plus at most one external provider.

    The builtin provider is always first. Only one non-builtin (external)
    provider is allowed. Failures in one provider never block the other.
    """

    def __init__(self) -> None:
        self._providers: List[MemoryProvider] = []
        self._tool_to_provider: Dict[str, MemoryProvider] = {}
        self._has_external: bool = False  # True once a non-builtin provider is added

    def add_provider(self, provider: MemoryProvider) -> None:
        """Register a memory provider.

        Built-in provider (name "builtin") is always accepted.
        Only **one** external (non-builtin) provider is allowed — a second
        attempt is rejected with a warning.
        """
        is_builtin = provider.name == "builtin"

        if not is_builtin:
            if self._has_external:
                existing = next(
                    (p.name for p in self._providers if p.name != "builtin"), "unknown"
                )
                logger.warning(
                    "Rejected memory provider '%s' — external provider '%s' is "
                    "already registered. Only one external memory provider is "
                    "allowed at a time. Configure which one via memory.provider "
                    "in config.yaml.",
                    provider.name, existing,
                )
                return
            self._has_external = True

        self._providers.append(provider)

        # Index tool names → provider for routing
        for schema in provider.get_tool_schemas():
            tool_name = schema.get("name", "")
            if tool_name and tool_name not in self._tool_to_provider:
                self._tool_to_provider[tool_name] = provider
            elif tool_name in self._tool_to_provider:
                logger.warning(
                    "Memory tool name conflict: '%s' already registered by %s, "
                    "ignoring from %s",
                    tool_name,
                    self._tool_to_provider[tool_name].name,
                    provider.name,
                )

Enter fullscreen mode Exit fullscreen mode

Architectural Takeaways from Registration:

  • Strict Constraints: The system deliberately limits external memory providers to one. This constraint prevents tool schema bloat and conflicts at the architectural level, rather than relying on runtime heuristics.
  • Automatic Routing Maps: By iterating over provider.get_tool_schemas(), the core automatically builds a routing table (self._tool_to_provider). The core doesn't need to know what tools a memory provider has; it simply maps whatever the provider declares.

Now let’s look at the Initialization phase, which leverages Dependency Injection to keep plugins decoupled from global configuration singletons:

# agent/memory_manager.py — Isolated initialization with dependency injection
def initialize_all(self, session_id: str, **kwargs) -> None:
    """Initialize all providers.

    Automatically injects hermes_home into *kwargs* so that every
    provider can resolve profile-scoped storage paths without importing
    get_hermes_home() themselves.
    """
    if "hermes_home" not in kwargs:
        from hermes_constants import get_hermes_home
        kwargs["hermes_home"] = str(get_hermes_home())
    for provider in self._providers:
        try:
            provider.initialize(session_id=session_id, **kwargs)
        except Exception as e:
            logger.warning(
                "Memory provider '%s' initialize failed: %s",
                provider.name, e,
            )

Enter fullscreen mode Exit fullscreen mode

By wrapping each plugin's initialization in an isolated try/except block, the core guarantees that a failure in a single plugin (e.g., a localized database connection timeout) does not prevent other plugins from starting up. The agent can degrade gracefully, running with reduced capabilities rather than crashing completely.


Interface Contracts: The Contract That Binds Core to Plugin

In a microkernel operating system, processes communicate via message passing over a strict Inter-Process Communication (IPC) protocol. The kernel does not care how a user-space file system is implemented internally—it only cares that the file system responds correctly to standard read/write system calls.

In Hermes Agent, this boundary is enforced using Python Abstract Base Classes (ABCs). The core never interacts with concrete plugin classes; it interacts exclusively with interface contracts.

Here is the contract for a MemoryProvider:

# agent/memory_provider.py — The interface contract
class MemoryProvider(ABC):
    """Abstract base for all memory providers.

    Every memory provider must implement these methods. The core
    never reaches into provider internals—it only calls these methods.
    """

    @property
    @abstractmethod
    def name(self) -> str:
        """Unique provider name."""
        ...

    @abstractmethod
    def get_tool_schemas(self) -> List[Dict[str, Any]]:
        """Return tool schemas this provider contributes."""
        ...

    @abstractmethod
    def system_prompt_block(self) -> str:
        """Return system prompt content for this provider."""
        ...

    @abstractmethod
    def prefetch(self, query: str, *, session_id: str = "") -> str:
        """Return relevant context for the given query."""
        ...

    @abstractmethod
    def handle_tool_call(self, tool_name: str, args: Dict[str, Any]) -> str:
        """Handle a tool call routed to this provider."""
        ...

    @abstractmethod
    def initialize(self, session_id: str, **kwargs) -> None:
        """Initialize provider resources."""
        ...

    @abstractmethod
    def shutdown(self) -> None:
        """Release provider resources."""
        ...

Enter fullscreen mode Exit fullscreen mode

This interface is the absolute boundary of the system. The core knows nothing about whether a plugin uses PostgreSQL, SQLite, Pinecone, or a local JSON file. It only cares that prefetch returns a string, and handle_tool_call returns a JSON-serializable string.

This abstraction makes routing tool calls incredibly clean and robust:

# agent/memory_manager.py — Contract-based tool routing
def handle_tool_call(
    self, tool_name: str, args: Dict[str, Any], **kwargs
) -> str:
    """Route a tool call to the correct provider.

    Returns JSON string result. Raises ValueError if no provider
    handles the tool.
    """
    provider = self._tool_to_provider.get(tool_name)
    if provider is None:
        return tool_error(f"No memory provider handles tool '{tool_name}'")
    try:
        return provider.handle_tool_call(tool_name, args, **kwargs)
    except Exception as e:
        logger.error(
            "Memory provider '%s' handle_tool_call(%s) failed: %s",
            provider.name, tool_name, e,
        )
        return tool_error(f"Memory tool '{tool_name}' failed: {e}")

Enter fullscreen mode Exit fullscreen mode

Because the core trusts the interface contract, the entire dispatch system is reduced to a simple lookup and execution. No custom parser logic, no hardcoded conditions, and no special-casing for individual memory backends.


Core Isolation: The Agentic Core as a Minimal Substrate

With the plugins safely isolated behind interface contracts, let's examine the "kernel" of Hermes Agent: the core conversation loop.

In run_agent.py, the core loop is kept intentionally small. Its only job is to manage the loop lifecycle, monitor the token/iteration budget, call the LLM transport layer, and dispatch tool calls.

# run_agent.py (simplified) — The core conversation loop
while (api_call_count < self.max_iterations and self.iteration_budget.remaining > 0) or self._budget_grace_call:
    api_call_count += 1
    self._touch_activity(f"starting API call #{api_call_count}")

    # Build API kwargs through the transport layer
    api_kwargs = self._build_api_kwargs(api_messages)

    # Make the API call (streaming or non-streaming)
    response = self._interruptible_streaming_api_call(api_kwargs)

    # Normalize the response across different LLM providers (OpenAI, Anthropic, etc.)
    normalized = self._get_transport().normalize_response(response)
    assistant_message = normalized

    # Check for tool calls
    if assistant_message.tool_calls:
        # Execute tools and append results
        self._execute_tool_calls(assistant_message, messages, effective_task_id)
        continue

    # No tool calls — this is the final response
    final_response = assistant_message.content or ""
    break

Enter fullscreen mode Exit fullscreen mode

Notice what is not in this loop.

  • There is no database code.
  • There is no vector search code.
  • There is no prompt construction logic.
  • There is no context compression.

The core loop is purely a state coordinator. It manages the flow of data but does not generate or manipulate it directly.

For instance, memory prefetching—which pulls relevant past context based on the user's query—happens outside the core loop, before it even starts:

# run_agent.py — Memory prefetch happens outside the core loop
if self._memory_manager:
    try:
        _query = original_user_message if isinstance(original_user_message, str) else ""
        _ext_prefetch_cache = self._memory_manager.prefetch_all(_query) or ""
    except Exception:
        pass

Enter fullscreen mode Exit fullscreen mode

Once the context is prefetched, it is injected into the user message block as a structured, fenced markdown block:

# run_agent.py — Prefetched context injected at message construction time
if idx == current_turn_user_idx and msg.get("role") == "user":
    _injections = []
    if _ext_prefetch_cache:
        _fenced = build_memory_context_block(_ext_prefetch_cache)
        if _fenced:
            _injections.append(_fenced)
    if _plugin_user_context:
        _injections.append(_plugin_user_context)
    if _injections:
        _base = api_msg.get("content", "")
        if isinstance(_base, str):
            api_msg["content"] = _base + "\n\n" + "\n\n".join(_injections)

Enter fullscreen mode Exit fullscreen mode

The core loop remains completely oblivious to where this context came from. It simply sees a standard user message with some appended text, executes its turn, and returns. This clean separation of concerns means you can swap out your memory provider, upgrade your embedding model, or completely rewrite your storage schema without ever risking a bug in your core conversation logic.


The Context Engine Plugin: A Case Study in Modular Design

One of the most complex tasks an AI agent faces is context window management. When a conversation gets too long, the agent must compress past messages, summarize old turns, or prune systemic data to avoid exceeding the model's context limit.

In a monolithic architecture, context compression is deeply coupled with the core loop. The agent must constantly check its token count, run summarization prompts mid-loop, and manually edit its own message history.

In Hermes Agent v0.13, the context engine is treated as a first-class plugin. The agent loads it dynamically at startup:

# run_agent.py — Context engine dynamically loaded as a plugin
if _engine_name != "compressor":
    # Try loading from plugins/context_engine/<name>/
    try:
        from plugins.context_engine import load_context_engine
        _selected_engine = load_context_engine(_engine_name)
    except Exception as _ce_load_err:
        logger.debug("Context engine load from plugins/context_engine/: %s", _ce_load_err)

    # Try general plugin system as fallback
    if _selected_engine is None:
        try:
            from hermes_cli.plugins import get_plugin_context_engine
            _candidate = get_plugin_context_engine()
            if _candidate and _candidate.name == _engine_name:
                _selected_engine = _candidate
        except Exception:
            pass

    if _selected_engine is None:
        logger.warning(
            "Context engine '%s' not found — falling back to built-in compressor",
            _engine_name,
        )

Enter fullscreen mode Exit fullscreen mode

Once loaded, the context engine can inject its own tools (like lcm_grep, lcm_describe, or lcm_expand for exploring compressed history) directly into the agent's available toolset:

# run_agent.py — Context engine tools injected into the tool surface
if hasattr(self, "context_compressor") and self.context_compressor and self.tools is not None:
    _existing_tool_names = {
        t.get("function", {}).get("name")
        for t in self.tools
        if isinstance(t, dict)
    }
    for _schema in self.context_compressor.get_tool_schemas():
        _tname = _schema.get("name", "")
        if _tname and _tname in _existing_tool_names:
            continue  # already registered via plugin/cache path
        _wrapped = {"type": "function", "function": _schema}
        self.tools.append(_wrapped)
        if _tname:
            self.valid_tool_names.add(_tname)
            self._context_engine_tool_names.add(_tname)
            _existing_tool_names.add(_tname)

Enter fullscreen mode Exit fullscreen mode

And when the LLM decides to call one of these tools, the core loop doesn't need any special-case handlers. It routes the execution through the exact same standard interface path used by all other tools:

# run_agent.py — Context engine tools dispatched through the normal path
elif self._context_engine_tool_names and function_name in self._context_engine_tool_names:
    # Context engine tools (lcm_grep, lcm_describe, lcm_expand, etc.)
    spinner = None
    if self._should_emit_quiet_tool_messages():
        face = random.choice(KawaiiSpinner.get_waiting_faces())
        emoji = _get_tool_emoji(function_name)
        preview = _build_tool_preview(function_name, function_args) or function_name
        spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots', print_fn=self._print_fn)
        spinner.start()
    _ce_result = None
    try:
        function_result = self.context_compressor.handle_tool_call(function_name, function_args, messages=messages)
        _ce_result = function_result
    except Exception as tool_error:
        function_result = json.dumps({"error": f"Context engine tool '{function_name}' failed: {tool_error}"})
        logger.error("context_engine.handle_tool_call raised for %s: %s", function_name, tool_error, exc_info=True)

Enter fullscreen mode Exit fullscreen mode

This design is incredibly elegant. The context engine is a highly complex, stateful system, but to the core agent, it is just another black box that implements the standard tool-execution interface.


Runtime Capability Discovery: Adapting Dynamically

A critical feature of the microkernel pattern is runtime discovery. The core system shouldn't have hardcoded assumptions about what capabilities are available. Instead, it should query its environment at runtime and adapt its behavior dynamically.

For example, when building system prompts, Hermes Agent doesn't hardcode prompt templates. It dynamically scans its skills directory to build an up-to-date manifest of what the agent can do:

# agent/prompt_builder.py — Runtime skill discovery
def _build_skills_manifest(skills_dir: Path) -> dict[str, list[int]]:
    """Build an mtime/size manifest of all SKILL.md and DESCRIPTION.md files."""
    manifest: dict[str, list[int]] = {}
    for filename in ("SKILL.md", "DESCRIPTION.md"):
        for path in iter_skill_index_files(skills_dir, filename):
            try:
                st = path.stat()
            except OSError:
                continue
            manifest[str(path.relative_to(skills_dir))] = [st.st_mtime_ns, st.st_size]
    return manifest

Enter fullscreen mode Exit fullscreen mode

This manifest is used to validate a local disk cache. If you drop a new skill file (SKILL.md) into the directory while the agent is running, the system automatically detects the change, invalidates the cache, and updates the agent's system prompt on the very next turn. No restarts, no configuration updates, and no code changes required.


State Persistence: Thread-Safe and Decoupled

In a modular architecture, plugins must be able to persist state without directly accessing or mutating the core agent object. If a plugin writes directly to the agent's instance variables, it breaks encapsulation and reintroduces the tightly coupled spaghetti code we are trying to avoid.

Hermes Agent solves this by providing a core, thread-safe persistence service: the Session Database (SessionDB).

# hermes_state.py — The session database as a core service
class SessionDB:
    """
    SQLite-backed session storage with FTS5 search.

    Thread-safe for the common gateway pattern (multiple reader threads,
    single writer via WAL mode). Each method opens its own cursor.
    """

    def __init__(self, db_path: Path = None):
        self.db_path = db_path or DEFAULT_DB_PATH
        self.db_path.parent.mkdir(parents=True, exist_ok=True)

        self._lock = threading.Lock()
        self._write_count = 0
        self._conn = sqlite3.connect(
            str(self.db_path),
            check_same_thread=False,
            timeout=1.0,
            isolation_level=None,
        )
        self._conn.row_factory = sqlite3.Row
        self._conn.execute("PRAGMA journal_mode=WAL")
        self._conn.execute("PRAGMA foreign_keys=ON")

        self._init_schema()

Enter fullscreen mode Exit fullscreen mode

By leveraging SQLite in Write-Ahead Logging (WAL) mode with a centralized connection lock, the core provides a robust, thread-safe storage layer that any plugin can query or write to.

For example, when the context compressor splits a session to archive history, it doesn't manipulate memory arrays. It simply writes a new session record with a parent_session_id to the database. The database acts as the single source of truth, keeping the memory footprint of both the core and the plugins completely clean.


Conclusion: The Path to Production-Grade AI Agents

Building an AI agent that works in a local terminal demo is easy. Building an AI agent that can run in production for months, handle thousands of concurrent users, recover from network dropouts, and scale its capabilities over time is incredibly hard.

If you continue to build agents as monolithic loops, you will eventually hit a wall of accidental complexity that slows your development to a crawl.

By adopting a microkernel architecture—separating your core loop from your capabilities, enforcing strict interface contracts, managing clean plugin lifecycles, and relying on runtime discovery—you build a system that is:

  • Resilient: A bug in a memory provider or a vector database timeout will not crash your core agent loop.
  • Extensible: You can add entirely new capabilities, tools, and models by writing a single class that implements a standard interface.
  • Testable: You can easily mock out entire plugins to test your core loop in isolation, or mock the core loop to unit-test your plugins.

As you design your next AI agent, step back from the prompt engineering and the vector database setup. Look at your architecture. Ask yourself: Is this a monolith waiting to collapse, or is it a microkernel built to scale?


Let's Discuss

  1. How do you handle graceful degradation in your current agent designs? If an external service like your vector database or context summarizer fails mid-conversation, does your agent crash, or does it dynamically adjust its toolset and keep going?
  2. What are the performance trade-offs of runtime capability discovery? In highly latent environments, how do you balance the flexibility of dynamic runtime discovery with the raw speed of compiled, static configurations?

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce: details link, you can find also my programming ebooks with AI here: Programming & AI eBooks.