How I turned 18% skill hit rate into 95% — without calling an embedding API once

My Claude Code setup grew to 152 skills. Every session, all 152 descriptions got dumped into the system prompt. The LLM scanned them blindly. When I later checked, keyword matching hit the right skill 18% of the time.

I had three options:

Call text-embedding-3 on every query — costs money, adds latency
Accept 18% as "good enough"
Build something
What I built
neuro-skill — a hybrid skill router that runs entirely on my machine. No API calls. No GPU. 5ms per query.

It fuses five signals via Reciprocal Rank Fusion:

BM25 keyword recall → Cosine feature similarity → Graph spreading activation
→ Collaborative filtering personalization → Optional LLM rerank
The results
Core domains: 95% Hit@1, 100% Hit@3
Chinese queries: fixed from 0% to 87% with bilingual keyword coverage
Multi-skill orchestration: router.plan("review + fix + deploy") returns ordered execution steps
Personalization: learns which skills you pick, not just which ones rank highest
An independent tester validated it on 332 skills (Hermes + ECC) — v0.7.1 hit 77% overall.

Architecture decisions that mattered
Feature matrix, not embeddings. 50 hand-crafted features (17 broad domains + 32 precise languages/actions) beat TF-IDF by 5× without any model.
Graph over threshold. k-NN graph (k=adaptive) keeps diffusion working at any skill count — 0.5% density with full adjacency was dead on arrival.
RRF over weighted sum. Min-max normalization broke when BM25 scored 0–15 and cosine gave −1 to 1. Reciprocal Rank Fusion only cares about rank position.
LLM as opt-in, not default. Haiku rerank is a 5th signal you toggle on when you need semantic nuance. Otherwise it's zero-cost.
MCP from day one. Claude Code, Cursor, Codex, Windsurf — all auto-discover the tools. No plugin installs.
What I learned
The algorithm ceiling is real. BM25 + cosine + graph + RRF + CF + LLM rerank — that's six layers. After that, gains are marginal. The remaining bottleneck is feature coverage, not algorithm design.

Also: three independent projects (neuro-skill, agent-skill-finder, SkillRouter paper) converged on the same architecture from different starting points. That's when you know you're on the right track.

Links
GitHub: github.com/wuykjl/neuro-skill
pip install neuro-skill
MIT licensed. 54 tests. 28 commits over 12 hours.

I built this for myself. It turned out useful for others too. If your agent has 50+ skills and keyword matching is letting you down, try it.

推荐订阅源

DEV Community