惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

DEV Community

We Trusted Auto-Ack. The Queue Agreed. Our Costs Didn't. DevOps for Developers: Reducing Cognitive Load and Boosting Transparency Python pytest: Write Tests That Actually Help You Next.js SaaS Boilerplate with BetterAuth, RBAC, i18n & Production-Ready Setup I built a free streaming site from scratch — no ads, no framework, no BS How I Taught My Incident Alerts to Say "This Broke 3 Minutes After Your Last Deploy" Why I Stopped Treating Job Applications as My Only Career Strategy Stop Watching Tutorials, Start Coding: How I Built CodeQuizz, an AI-Powered Active Learning Engine How We Generate 300+ AI Business Ideas a Month With GPT-5 (and Filter the Junk Out) The Intent Layer Your AI Coding Agent Does Not Need a Bigger Prompt How I solved a problem in my house using with an AI-powered application! Structure: A Local-First Interview IDE Powered by Gemma 4 Build in public, month 2: 615 of 616 visitors never clicked anything Someone wrote a fake EULA into Bitcoin. Two hours later they revoked it. Insights of Git ( part : 1 ) Someone wrote a fake EULA into Bitcoin. Two hours later they revoked it. Payload CMS Has 508 Circular Dependencies. Next.js Has 17. Here's Why They Form in Every Large JS Codebase. Prompt Packs Are Dead. Long Live Skills Why I Started Building a Portfolio Tracker Senior developer" after 3 years is title laundering Stripe Webhook Idempotency in FastAPI: Handling Duplicate Events Without Double-Charging SaaS Customers What Happens Before Your C Program Reaches the CPU? FinOps for Startups: How to Keep Your AWS Bill Under $100/Month Configuring CORS in Azure API Management How RBI Quietly Created a New Billion Dollar Industry in International Payments Time Need To Rearrange Binary String I Updated My GitHub Auto-Commit Desktop App I Have Reviewed Over 400 Resumes for Tech roles. Here Is What Actually Gets You the Phone Screen [Boost] Awesomeness! We built a lightweight, 100% local File Integrity Monitor (FIM) with zero telemetry Building chart() for Tala: From Raw Indicator Data to Something You Can Actually Inspect A client-side secret scanner that physically can't exfiltrate your code (and why you shouldn't trust mine either) Your AI Agent Should Text You First Built free app for game design and worldbuilding You Have a Free AI Model Sitting in Chrome Right Now I created a fork of GunDB and rewrote it in TypeScript using Vibe Code 6 Advanced JavaScript Questions That Separate Seniors from Mid-Levels Claude Does Not Need More Prompts. It Needs Reasoning Discipline. An Introduction to AI Hub, Part 2: Custom MCP Servers I built a RAG pipeline from scratch — no LangChain, just FastAPI + FAISS How I built a dependency risk scanner with Coral in 7 days Local-first: a Model on Your Own Machine, Zero Cloud 2487. Remove Nodes From Linked List C_STD : A Leak-Free, Cross-Platform Standard Library for Modern C How to build your professional network as a developer — authentic strategies The Pope and the Dynamo Building ShouldWeAutomate: A Decision Intelligence Platform for Workflow Automation The Reputation Layer: Why Developers Quietly Run Corporate PR The Last Mile of Software Is a Sentence AppView 1.0.0 Released: Instrument and Secure Your LLM Deployments The Hermes Rescue: How an Open Agent Rebuilt My GitHub Projects from Scratch S2 — Heap Corruption Crashes: How to Diagnose and Fix Them I built a Chrome extension because I couldn't stop opening Twitter between Pomodoro sessions AI cheating in technical interviews is invisible to interviewers — here's how we detect it Lean4 Might Be the Missing Piece in AI: Why Theorem Provers Are Suddenly Everywhere The Zero-Drift API Series: Stop Trusting a Green Build You Can't Explain How I Deployed My First Project on AWS (And Didn't Break Everything) How I Built a Real-Time Quiz Platform with Next.js, WebSockets, and Learning Science When Your VPS Blocks Outbound SMTP: What Actually Helps Los agentes de código necesitan memoria durable, no solo contexto Cognitive Architectures of AGI: 7 Patterns That Transform LLMs from Oracles into Thinkers I Built a Chat App That Deletes Itself (Because I Was Bored at 2am) Uncovering the Power of Linux's History Command How to Add a Contact Form to Your Ghost Blog Accept Payments in Minutes with Afriex Checkout Sessions Hermes Agent Gets Smarter Every Day. So Does the Bill. How I get Next.js sites to load almost instantly — a practical checklist Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event Test a DNS Leak in 2 Minutes: Complete Methodology + Per-OS Fixes (2026) Lessons from building a Chrome extension Rivet: A library i made in 2 days I Built a Speech-to-Text Tool Because Sometimes Typing Just Gets in the Way How I'm Building a Multi-Agent Crew for AI Coding Supervision (Cipher Update) Your AI Agent Needs a Manager, Not a Superhero I Built CausalLens — A Free, Open-Source Causal Impact Calculator for Time Series (5 Methods, Zero Setup) How to write good commit messages and pull requests — a team guide Cipher: The Jarvis with a Hermes Core How to build a second brain with Obsidian and Claude Code (step by step) Claude completed my MPI assignment. Then it couldn't run it. So I built the missing piece. This 100% How Our Document Ingestion Pipeline Turns Files into LLM-Ready Markdown Agentic AI Model Risk Management: Aligning with Regulatory Expectations CTV Fraud Has an IPv6 Business Problem The great AI enshittification The Veltrix Treasure Hunt Engine: Why Our First Rewrite Cost Us 3.2 Million Requests Per Second I Made My AI Models Argue, Then Let Hermes Be the Judge Road To KiwiEngine #4: The Racecar Driver Analogy Run Aider on Ollama, Bedrock, or Any LLM Provider — One Gateway, Every Model BAIXAR VÍDEO DO YOUTUBE Releasing HeliosProxy, The programmable Postgres data-plane Hello, DEV Community! 👋 Three Bitcoin Primitives That Don't Exist Anywhere Else (PoW Beacon, DLC Oracle, Fair-Launch Rune) Append-only doesn't mean what you'd hope Notes from the Mistral AI Now Summit Are Claude skills safe in 2026? What the Snyk ToxicSkills audit actually found How to not Lose $500M via API Bills: Run Private AI for 100 Engineers Under $1 Million The Unlikely Journey from Bricks to Bytes Three TODOs, three weeks, one weekend: finishing pq v0.14
Beyond Static Prompts: How to Build Self-Improving AI Agents with Closed-Loop Skill Playbooks
Programming Central · 2026-05-31 · via DEV Community

The current wave of AI development is undergoing a massive paradigm shift. We are rapidly moving past simple "prompt wrapper" applications and entering the era of fully autonomous, agentic systems.

Yet, if you’ve tried to build an AI agent for a production environment, you’ve likely run into a frustrating wall. You write a comprehensive system prompt, equip your agent with a few API tools, and set it loose. It works beautifully on your first three test runs. But on the fourth run, the real world throws a curveball—a changed website structure, an unexpected API response, or a minor user correction—and your agent completely derails.

The problem isn't the underlying Large Language Model (LLM). The problem is how we define agent capabilities.

In most architectures, an agent's "skills" are defined as static, hardcoded instructions or rigid tool definitions. They are passive. To build truly resilient AI systems, we need to treat skills not as static code, but as living, self-contained, closed-loop feedback systems.

In this post, we will deconstruct the anatomy of a self-improving agent "Skill" using the architectural patterns of the open-source Hermes Agent framework. We'll explore how to design skills that can execute complex workflows, evaluate their own performance, and dynamically rewrite their own execution playbooks to get smarter over time.

(The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce)


The Core Concept: The Skill as a Closed-Loop Feedback System

To understand how a self-improving agent works, let’s step away from code for a moment and look at a human analogy: a master craftsperson in a workshop.

A master carpenter doesn't approach a new project with a rigid, unchangeable checklist. Instead, they operate with an internal playbook built on experience. This playbook consists of three distinct phases:

  1. The Trigger: A specific signal that initiates a project. “A client walks in asking for a custom oak dining table.”
  2. The Execution Logic: The modular, physical steps of the craft. “Select the lumber, mill the wood, cut the mortise and tenon joints, sand the surfaces, apply the finish.”
  3. The Memory Integration: The reflective learning process after the job is done. “This specific batch of white oak was highly prone to tear-out during milling. Next time, I will adjust the planer feed rate and angle.”

This feedback loop updates the carpenter's internal playbook. The next time a client triggers a "custom table" request, the execution is smoother, faster, and higher quality.

In advanced agent architectures, this is not a vague metaphor—it is a precise, code-level implementation. A Skill is a formalized, stateful playbook that the agent can load, execute, and—crucially—self-modify based on the outcome of its execution.

Instead of a developer manually editing prompts in a codebase, the agent acts as its own developer, optimizing its own instructions through a continuous cycle of Invoke → Execute → Review → Update.


Deconstructing the Playbook: Trigger, Execution, and Memory

To build a system capable of this level of autonomy, we must formally decompose a skill into three interdependent components.

[ Trigger (Invocation Contract) ]
               │
               ▼
[ Execution Logic (Modular Workflow) ] ◄───┐ (Self-Correction / Updates)
               │                           │
               ▼                           │
[ Memory Integration (Feedback Loop) ] ────┘

Enter fullscreen mode Exit fullscreen mode


1. The Trigger: The Invocation Contract

The Trigger is the input schema that defines exactly when and how a skill is activated. It acts as a strict contract between the agent’s core decision-making loop and the skill’s execution engine.

Without a deterministic trigger, agents suffer from unpredictable activation, running the wrong code at the wrong time. This violates the Principle of Least Astonishment (POLA): an agent’s behavior must remain highly predictable based on the inputs that activated it.

In practice, triggers generally manifest in two ways:

  • Explicit Slash Commands: Highly deterministic inputs. When a user types /web-search "latest AI trends", the system scans its available skills, identifies the match, and packages the user's query into a structured payload.
  • Contextual Invocation: Dynamic, reasoning-based triggers. During a multi-turn conversation, the agent’s LLM evaluates the user's intent. If the user says, "Deploy this code to our staging server," the agent's internal reasoning engine recognizes that the current context matches the entry criteria for the deployment skill and triggers it automatically.

2. The Execution Logic: The Modular Workflow

Once triggered, the skill executes its playbook. The golden rule of agentic execution is modularity. The execution logic must be composed of atomic, chainable steps rather than a single, monolithic "black box" prompt.

Consider a complex skill like "Set up a new React project." If you pass this entire request to a single LLM prompt, the model has to generate the directory structure, write the configuration files, install dependencies, and verify the build in one massive, error-prone leap.

Instead, a modular playbook breaks the skill down into atomic tool calls:

  1. terminal("mkdir my-app && cd my-app")
  2. terminal("npx create-react-app .")
  3. read_file("src/App.js")
  4. write_file("src/App.js", optimized_template)

Because each step is an atomic tool call, the system can inspect the inputs and outputs of every single transition. If step 2 fails because npm is out of date, the agent doesn't have to restart the entire process; it can isolate the failure to that specific step, run a corrective action, and resume execution.


3. Memory Integration: The Closed-Loop Feedback

This is where true self-improvement happens. After the execution logic completes, the system must answer a critical question: What did we learn from this run?

To handle this, the architecture splits feedback into two distinct systems:

The Macro-Level Feedback Loop (The Curator)

Operating in the background, a curation system monitors the agent's entire skill library. It tracks high-level usage metrics:

  • How many times has this skill been invoked?
  • How often does it run successfully without throwing errors?
  • When was it last modified?

If a skill is rarely used, or if its error rate spikes after a system update, the Curator automatically flags it for deprecation, archiving, or manual developer review.

The Micro-Level Feedback Loop (The Memory Provider)

This loop operates on a per-invocation basis. When a skill finishes executing, the agent spawns a background review process. This is a separate, lightweight LLM instance that acts as an objective "critic."

The critic reviews the entire execution trace: the initial user request, the steps the agent took, the tool outputs, and the final result.

  • Did the agent follow the playbook instructions?
  • Did any tools return errors?
  • Did the user have to manually correct the agent's output?

If the critic detects a failure pattern—for example, a web scraper tool failed because a target website updated its CSS selectors—it doesn't just log an error. It uses a management tool to patch the skill's playbook file (SKILL.md), updating the instructions with the correct selectors for the next run.


The Closed Learning Loop in Action

Let's look at how this theoretical model plays out step-by-step in a real-world scenario: searching for and extracting web data.

   User Input: "/gif-search cute cats"
               │
               ▼
┌─────────────────────────────────────────────────────────┐
│ 1. TRIGGER                                              │
│ - scan_skill_commands() matches "/gif-search"           │
│ - Loads "SKILL.md" and packages payload                 │
└──────────────┬──────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────┐
│ 2. EXECUTION                                            │
│ - Step A: Run web_search("cute cats gif")               │
│ - Step B: Extract direct image URLs                     │
│ - Step C: Return formatted markdown link to user        │
└──────────────┬──────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────┐
│ 3. MEMORY INTEGRATION (Background Review)               │
│ - Critic detects Step B failed (regex extraction error)  │
│ - Generates a patch to fix the regex pattern            │
│ - Writes update back to "SKILL.md"                      │
└─────────────────────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode

  1. Trigger: The user enters /gif-search cute cats. The system scans the local skills directory, matches the command, parses the YAML metadata in the skill's header, and loads the execution instructions.
  2. Execution: The agent reads the instructions in SKILL.md. It executes a web_search tool call, parses the HTML, and attempts to extract the image URLs. However, the target search engine has updated its markup, causing the agent's regex extraction step to fail. The agent tries an alternative fallback method, successfully retrieves a URL, and displays it to the user.
  3. Memory Integration: The user is satisfied, but the background review agent notes that the primary extraction step failed. It analyzes the HTML payload, generates a corrected extraction pattern, and uses a skill management tool to patch the SKILL.md file. The next time the user runs /gif-search, the agent executes the corrected logic flawlessly on the first attempt.

Building the Engine: A Deep Dive into the Code

To bring this concept to life, let’s build a production-ready Skill Discovery Engine in Python. This implementation mirrors the patterns used in the Hermes Agent architecture. It scans a local directory for skill playbooks defined in Markdown, parses their metadata using YAML frontmatter, sanitizes their invocation commands, and indexes them for execution.

The Skill Playbook Template (SKILL.md)

Before writing the Python parser, here is how a typical self-improving skill playbook is structured. Notice the YAML frontmatter at the top, followed by modular, human-readable execution steps that the LLM can interpret and modify.

---
name: gif-search
description: Search the web for animated GIFs matching a query and return markdown image links.
version: 1.1.0
author: hermes-system
tags: media, search, web
category: utility
platforms: macos, linux, windows
---

# Playbook: GIF Search

## Trigger Contract
Activated explicitly via `/gif-search <query>` or contextually when the user requests an animated image or reaction GIF.

## Execution Steps
1. Call `web_search` tool with the query appended with "filetype:gif site:giphy.com OR site:tenor.com".
2. Parse the search results. Use the following regex pattern to extract raw media URLs: `https://media\.giphy\.com/media/[a-zA-Z0-9]+/giphy\.gif`.
3. If the primary regex fails, fall back to extracting any URL ending with `.gif` from the page source.
4. Format the output as a standard Markdown image link: `![Result](url)`.

Enter fullscreen mode Exit fullscreen mode

The Python Implementation

Here is the complete, self-contained Python engine to discover, parse, and index these skill playbooks.

"""
Basic Skill Library Implementation

This module provides a clean, standalone implementation for discovering,
indexing, and invoking skills from a local directory. It demonstrates the
core patterns used by Hermes Agent's skill management system.

Key Features:
- Scans a directory for SKILL.md files
- Parses YAML frontmatter for metadata (name, description, tags)
- Creates a mapping of skill names to their file paths and metadata
- Provides a simple invocation mechanism that returns the skill's content
- Includes a reload mechanism to pick up new or changed skills
"""

import json
import logging
import os
import re
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List, Optional, Set, Tuple

# Configure logging for this module
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# ─── Constants ───────────────────────────────────────────────────────────
# The name of the skill definition file within each skill directory
SKILL_MD_FILENAME = "SKILL.md"

# Regular expression to match YAML frontmatter blocks
# Frontmatter is delimited by '---' at the start and end
FRONTMATTER_PATTERN = re.compile(r"^---\s*\n(.*?)\n---\s*\n", re.DOTALL)

# Patterns for sanitizing skill names into clean, hyphen-separated slugs
# This ensures compatibility with slash command naming conventions
SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
SKILL_MULTI_HYPHEN = re.compile(r"-{2,}")

# ─── Data Structures ─────────────────────────────────────────────────────


class SkillMetadata:
    """
    Represents the parsed metadata from a skill's SKILL.md frontmatter.

    This class encapsulates all the information needed to identify, describe,
    and invoke a skill. It follows the Principle of Least Astonishment (POLA)
    by providing clear, predictable attribute names and behavior.
    """

    def __init__(
        self,
        name: str,
        description: str = "",
        version: str = "1.0.0",
        author: str = "",
        tags: Optional[List[str]] = None,
        category: str = "general",
        platforms: Optional[List[str]] = None,
    ):
        """
        Initialize a SkillMetadata instance.

        Args:
            name: The canonical name of the skill (e.g., "gif-search")
            description: A human-readable description of what the skill does
            version: Semantic version string (default: "1.0.0")
            author: The creator of the skill
            tags: A list of searchable tags for the skill
            category: The functional category of the skill (default: "general")
            platforms: A list of OS platforms this skill supports (e.g., ["macos", "linux"])
        """
        self.name = name
        self.description = description
        self.version = version
        self.author = author
        self.tags = tags or []
        self.category = category
        self.platforms = platforms or []

    def to_dict(self) -> Dict[str, Any]:
        """Convert the metadata to a dictionary for serialization or display."""
        return {
            "name": self.name,
            "description": self.description,
            "version": self.version,
            "author": self.author,
            "tags": self.tags,
            "category": self.category,
            "platforms": self.platforms,
        }

    @classmethod
    def from_frontmatter(cls, frontmatter: Dict[str, Any]) -> "SkillMetadata":
        """
        Create a SkillMetadata instance from a parsed YAML frontmatter dict.

        This class method handles the extraction of known fields and provides
        sensible defaults for any missing values.

        Args:
            frontmatter: A dictionary of key-value pairs from the SKILL.md frontmatter

        Returns:
            A new SkillMetadata instance populated with the frontmatter data
        """
        # Extract the name, falling back to a placeholder if missing
        name = frontmatter.get("name", "").strip()
        if not name:
            logger.warning("Skill frontmatter missing 'name' field; using placeholder.")
            name = "unnamed-skill"

        # Extract description, version, author, tags, category, and platforms
        description = frontmatter.get("description", "").strip()
        version = str(frontmatter.get("version", "1.0.0")).strip()
        author = frontmatter.get("author", "").strip()
        tags = frontmatter.get("tags", [])
        if isinstance(tags, str):
            tags = [tag.strip() for tag in tags.split(",")]
        category = frontmatter.get("category", "general").strip().lower()
        platforms = frontmatter.get("platforms", [])
        if isinstance(platforms, str):
            platforms = [p.strip() for p in platforms.split(",")]

        return cls(
            name=name,
            description=description,
            version=version,
            author=author,
            tags=tags,
            category=category,
            platforms=platforms,
        )


class SkillInfo:
    """
    Represents a discovered skill with its metadata, file path, and content.

    This is the primary data structure returned by the skill scanner. It
    bundles all the information needed to load and use a skill.
    """

    def __init__(
        self,
        metadata: SkillMetadata,
        skill_md_path: Path,
        skill_dir: Path,
        content: str = "",
    ):
        """
        Initialize a SkillInfo instance.

        Args:
            metadata: The parsed metadata from the SKILL.md frontmatter
            skill_md_path: The absolute path to the SKILL.md file
            skill_dir: The absolute path to the skill's directory
            content: The full text content of the SKILL.md file (optional)
        """
        self.metadata = metadata
        self.skill_md_path = skill_md_path
        self.skill_dir = skill_dir
        self.content = content

    def to_dict(self) -> Dict[str, Any]:
        """Convert the skill info to a dictionary for serialization."""
        return {
            "name": self.metadata.name,
            "description": self.metadata.description,
            "version": self.metadata.version,
            "author": self.metadata.author,
            "tags": self.metadata.tags,
            "category": self.metadata.category,
            "platforms": self.metadata.platforms,
            "skill_md_path": str(self.skill_md_path),
            "skill_dir": str(self.skill_dir),
            "content_length": len(self.content),
        }


# ─── Frontmatter Parser ─────────────────────────────────────────────────


def parse_frontmatter(content: str) -> Tuple[Dict[str, Any], str]:
    """
    Parse YAML frontmatter from a skill file's content.

    This function extracts the YAML frontmatter block (delimited by '---')
    and returns it as a dictionary, along with the remaining body content.
    It uses a simple regex-based approach for parsing, which is sufficient
    for the basic metadata fields used in SKILL.md files.

    Args:
        content: The full text content of a SKILL.md file

    Returns:
        A tuple of (frontmatter_dict, body_content_string)
    """
    # Attempt to match the frontmatter pattern at the start of the content
    match = FRONTMATTER_PATTERN.match(content)
    if not match:
        # No frontmatter found; return an empty dict and the full content as body
        return {}, content.strip()

    # Extract the raw YAML string and the body content
    raw_yaml = match.group(1)
    body = content[match.end():].strip()

    # Parse the raw YAML string into a dictionary
    # We use a simple key-value parser for this example
    frontmatter = {}
    for line in raw_yaml.split("\n"):
        line = line.strip()
        if not line or line.startswith("#"):
            continue
        if ":" in line:
            key, _, value = line.partition(":")
            key = key.strip()
            value = value.strip().strip('"').strip("'")
            frontmatter[key] = value

    return frontmatter, body


def sanitize_skill_name(name: str) -> str:
    """
    Sanitize a skill name into a clean, hyphen-separated slug.

    This ensures compatibility with slash command naming conventions
    (e.g., normalizing spaces and underscores to hyphens).

    Args:
        name: The raw skill name to sanitize

    Returns:
        A sanitized, hyphen-separated slug
    """
    # Convert to lowercase and replace spaces/underscores with hyphens
    slug = name.lower().replace(" ", "-").replace("_", "-")

    # Remove any characters that are not alphanumeric or hyphens
    slug = SKILL_INVALID_CHARS.sub("", slug)

    # Collapse multiple consecutive hyphens into a single hyphen
    slug = SKILL_MULTI_HYPHEN.sub("-", slug)

    # Strip leading and trailing hyphens
    slug = slug.strip("-")

    return slug


# ─── Skill Scanner ──────────────────────────────────────────────────────


class SkillScanner:
    """
    Scans a directory for SKILL.md files and indexes them as skills.

    This class is the core of the skill discovery system. It walks a given
    directory tree, finds all SKILL.md files, parses their frontmatter, and
    creates a mapping of skill names to their metadata and file paths.

    The scanner is designed to be efficient and robust, handling missing
    files, malformed frontmatter, and duplicate skill names gracefully.
    """

    def __init__(self, skills_dir: Path):
        """
        Initialize the SkillScanner with a root directory to scan.

        Args:
            skills_dir: The root directory to scan for skill files
        """
        self.skills_dir = skills_dir
        self._skills: Dict[str, SkillInfo] = {}
        self._last_scan_time: Optional[datetime] = None

    def scan(self) -> Dict[str, SkillInfo]:
        """
        Perform a full scan of the skills directory.

        This method walks the directory tree, finds all SKILL.md files,
        parses their frontmatter, and builds an in-memory index of skills.
        It returns a dictionary mapping sanitized skill names to SkillInfo objects.

        Returns:
            A dictionary of {sanitized_skill_name: SkillInfo}
        """
        # Reset the skills index before scanning
        self._skills = {}
        seen_names: Set[str] = set()

        # Ensure the skills directory exists before scanning
        if not self.skills_dir.exists():
            logger.warning(f"Skills directory does not exist: {self.skills_dir}")
            return self._skills

        # Walk the directory tree, looking for SKILL.md files
        for skill_md_path in self.skills_dir.rglob(SKILL_MD_FILENAME):
            # Skip hidden directories (e.g., .git, .github)
            if any(part.startswith(".") for part in skill_md_path.parts):
                continue

            try:
                # Read the SKILL.md file content
                content = skill_md_path.read_text(encoding="utf-8")

                # Parse the frontmatter and body
                frontmatter, body = parse_frontmatter(content)

                # Create a SkillMetadata object from the frontmatter
                metadata = SkillMetadata.from_frontmatter(frontmatter)

                # Sanitize the skill name for use as a slash command
                sanitized_name = sanitize_skill_name(metadata.name)

                # Skip duplicate skill names (first one wins)
                if sanitized_name in seen_names:
                    logger.warning(
                        f"Duplicate skill name '{sanitized_name}' found at "
                        f"{skill_md_path}; skipping."
                    )
                    continue

                # Mark this name as seen
                seen_names.add(sanitized_name)

                # Create a SkillInfo object and add it to the index
                skill_info = SkillInfo(
                    metadata=metadata,
                    skill_md_path=skill_md_path,
                    skill_dir=skill_md_path.parent,
                    content=content,
                )
                self._skills[sanitized_name] = skill_info

                logger.info(
                    f"Discovered skill: {sanitized_name} "
                    f"({metadata.description})"
                )

            except Exception as e:
                # Log the error but continue scanning other skills
                logger.error(
                    f"Failed to parse skill at {skill_md_path}: {e}"
                )

        # Record the scan time
        self._last_scan_time = datetime.now()

        logger.info(
            f"Scan complete. Found {len(self._skills)} skill(s) "
            f"in {self.skills_dir}."
        )
        return self._skills

    def get_skill(self, name: str) -> Optional[SkillInfo]:
        """Retrieve a skill by its sanitized name."""
        return self._skills.get(sanitize_skill_name(name))


# ─── Demonstration ───────────────────────────────────────────────────────

if __name__ == "__main__":
    import tempfile

    # Create a temporary directory structure to simulate a skill library
    with tempfile.TemporaryDirectory() as temp_dir:
        temp_path = Path(temp_dir)

        # Define a mock skill directory and SKILL.md file
        search_skill_dir = temp_path / "gif-search"
        search_skill_dir.mkdir(parents=True, exist_ok=True)

        mock_skill_content = """---
name: Gif Search
description: Search and retrieve animated GIFs
version: 1.0.2
author: Developer-Alpha
tags: media, search
category: utility
platforms: macos, linux
---
# Playbook: Gif Search
This playbook defines how to search and retrieve animated GIFs...
"""

        skill_file = search_skill_dir / SKILL_MD_FILENAME
        skill_file.write_text(mock_skill_content, encoding="utf-8")

        # Initialize and run the scanner
        scanner = SkillScanner(temp_path)
        discovered_skills = scanner.scan()

        # Retrieve and inspect our parsed skill
        gif_skill = scanner.get_skill("gif-search")
        if gif_skill:
            print("\n--- Parsed Skill Metadata ---")
            print(json.dumps(gif_skill.to_dict(), indent=2))

Enter fullscreen mode Exit fullscreen mode


Code Walkthrough: How the Discovery Engine Works

Let’s break down the key design patterns in the code above and explain why they are critical for building an extensible agent.

1. Robust Metadata Parsing with SkillMetadata

The SkillMetadata class is designed around the Principle of Least Astonishment (POLA). Frontmatter configurations can often be messy, written by different developers or generated by different LLM versions.

The from_frontmatter class method acts as a defensive wrapper. It parses raw string tags, normalizes functional categories to lowercase, and provides safe fallbacks for missing critical fields (like naming a folderless skill unnamed-skill rather than throwing a fatal runtime error).

2. Strict Command Sanitization with sanitize_skill_name

In an agentic system, skills are frequently invoked via slash commands (e.g., /gif-search). However, file systems and human-entered names are prone to irregularities (spaces, mixed casing, underscores, or illegal characters).

The sanitize_skill_name function uses regular expressions to normalize any input into a clean, lowercased, hyphen-separated slug:

  • "Gif Search" becomes "gif-search"
  • "deploy_to_prod!!" becomes "deploy-to-prod" This ensures that the invocation contract remains completely deterministic.

3. Non-Blocking Error Boundaries in SkillScanner

When scanning a directory containing dozens of skills, one malformed SKILL.md file should not crash the entire application.

The SkillScanner.scan method wraps the parsing logic of individual files inside a broad try-except block. If a developer (or a background review agent) introduces a syntax error into a specific skill's YAML block, the scanner logs the error with details about the offending file, skips it, and continues indexing the rest of the library. This guarantees high system availability in production.


Why This Matters for the Future of AI Engineering

By treating skills as self-contained, closed-loop playbooks, we unlock several profound advantages over traditional LLM architectures:

Meta-Learning (The Agent That Learns to Learn)

Instead of hardcoding edge-case handlings into your application code, you give the agent the tools to debug itself. When an API payload format changes, the agent's background review process updates the corresponding SKILL.md file. The agent adapts without a developer ever having to push a line of code or restart a container.

Software-Defined Agent Capabilities

Because skills are packaged as simple, standard Markdown files with YAML headers, they are highly portable. You can version-control your agent's skills in Git, roll back problematic updates, and share skill libraries across entirely different agent instances.

Drastically Lower Inference Costs

Instead of cramming a massive, multi-thousand-token system prompt containing every single tool instruction into every single LLM call, the agent dynamically loads only the specific SKILL.md file required for the active task. This keeps prompt context windows small, speeds up response times, and dramatically reduces API costs.


Conclusion

The era of static, hardcoded AI assistants is drawing to a close. To build robust, resilient software agents that can operate autonomously in the real world, we must build them to be self-improving.

By decomposing skills into deterministic Triggers, modular Execution Logic, and self-supervised Memory Integration, we create a foundation for continuous learning. The agent ceases to be a static instruction-follower and becomes an adapting craftsman—growing more capable, more efficient, and more resilient with every single run.


Let's Discuss

  1. How do you handle edge-case failures in your current agentic workflows? Would a self-patching SKILL.md architecture solve some of your most common runtime errors?
  2. What are the security implications of allowing an LLM to self-modify its own skill playbooks? How would you design guardrails or human-in-the-loop validation steps for a production-grade self-improving agent?

Leave a comment below with your thoughts—we’d love to hear how you're approaching the challenge of agent statefulness and self-improvement!

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce: details link, you can find also my programming ebooks with AI here: Programming & AI eBooks.