惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

The Multi-Provider LLM Problem: Why “One API” Is Not Enough RememberMe CareGrid: Local Gemma 4 for dementia memory and safety Google Is Killing Gemini CLI on June 18. Here Is What to Do Before Then Do Domínio ao Deploy: Hospedando Arquivos de Deep Links no Cloudflare Pages (Parte 7.1) Running Gemma 4 26B on an Old GTX 1080 with llama.cpp Devlog 1: I tried building an SNES game with the super FX chip Why Gemma 4 Feels Like an Important Moment for AI Developers✨ From Zero and Confused, This Is How I Started Learning to Code I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founder’s Best Friend MyErp Architecture Series - #02 Cellular Architecture: Mapping Biology to Software Systems NodeJS vs Bun vs Go 🌍 RTL Arabic Style UI How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4 📝 Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders 터미널 AI 에이전트 구축 (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma Why I Built a Privacy-First Dev Toolkit GAS Input Tags: Ability Activation Without Hardcoded Bindings AI Legal Document Advisor Supported By Gemm 4 Model Building Convertify in Public Week 10: PDF Cluster + Blog Launch CureNet AI: Decentralized Health Intelligence for India, Powered by Gemma 4 and ABHA Standardization When Open-Weights AI Meets a Broken Healthcare System: Deploying Gemma 4 in Rural India V.A.L.I.D. Google I/O 2026: The Year Google Stopped Building AI Assistants and Started Shipping AI Engineers Bondmap: AI-Powered Relationship Network That Maps How You're Connected to Everyone Using Gemma 4 Gemma 4 challenge inspired me to build my first app! 96. LoRA: Fine-Tune a Billion-Parameter Model on a Laptop From a Student Who Used CircuitVerse to a GSoC Contributor — My Community Bonding Story How Bf-Tree Keeps Mini-Pages Small, Hot, and Cheap to Evict I asked Claude to explain the chip war and ended up understanding modern geopolitics differently Stop Manually Checking for Server Updates: Automate With Email Notifications Nostalgia Meets Cybersecurity: Spotting Modern Scams in a Retro OS Simulator - Forward or Fraud CRACKING CODING INTERVIEW From Python to Production Pipeline :A Practical guide to Apache Airflow Antigravity 2.0: Google Just Changed What It Means to Be an Engineer I Built a Free Sticker Maker Because Every Other One Hid the Export How I bypassed Blazor WebAssembly's Virtual DOM using raw WASM pointers Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable The Zero-Budget Memory Setup Behind My AI Agent Workflow No database. No framework. Just files, startup order, correction logs, and discipline. I Built an AI Second Brain with Gemma 4 The Most Exciting Google I/O 2026 Announcement for Me: HTML-in-Canvas CrisisLens: Compressing Disaster Scenes into 200-Byte Emergency Payloads with Gemma 4 I'm 15 and I built a todo app with Telegram Stars payments — only legal way for me to monetize before turning 18 Crypto Branding After the Token Launch Building an on-chain alerts bot in Python without any blockchain library FinePrint — An AI Pocket Lawyer That Decodes Predatory Contracts Using Gemma 4 How to Connect OpenAI with Supabase in 10 Minutes for a Lightning-Fast AI MVP One AI Gateway for AWS Bedrock, Google Vertex AI, Gemini, and Anthropic Reading Log #9 — Aoashi The Tacit Dimension Thinking, Fast and Slow Web3 Onboarding Is Not a Wallet Problem. It Is a Trust Problem. FHE Prompt Privacy: The Metadata Leak Your Demo Still Has Software Might Be Becoming Agent-Aware: What if software starts coordinating itself? The Silent Killers of Go Concurrency: Mutexes, Semaphores, and Goroutine Leaks Lynx framework first look Building Aries AI: A Solo-Built AI Abacus Tutor on OpenAI + Supabase + Render + Razorpay I built a paid Telegram bot. Here's what Telegram Stars actually pay. Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions Improving AI resume matching with prompt iteration — 7.37 to 8.37/10 7 things you can do with Rogue Studio that no other AI IDE will let you do Why I Think WordPress Still Matters Reading Log #7 — Aoashi Guns, Germs, and Steel Distinction Open Models and the Sub-Saharan Region What 12 Months of AI-Generated Pull Requests Taught My Engineering Team Feature Flags in .NET 8: ASP.NET Core, Minimal APIs, Blazor The Quiet Architecture of Systems That Refuse to Die From OOP to SOLID: Everything You Need to Know in One Article I Scanned 5 Common LangChain Agent Patterns. Every Single One Was Over-Permissioned. Production-Ready MCP Servers in 60 Seconds (Auth, Rate Limits, Audit Logs Included) Dari OOP ke SOLID: Semua yang Perlu Kamu Tahu dalam Satu Artikel The Most Important Part of Google I/O 2026 Wasn’t a Model — It Was the Infrastructure When SafetyCo Goes to War: Anthropic, the DOD, and the Limits of Ideals-Based Frameworks Why AI Memory Resolves Too Much — And What to Preserve Instead What Gemma 4 Means for the Future of Local AI (And Why It Matters More Than GPT-5) The Classroom Gap: Why Applied AI Has Yet to Transform How the World Learns
How I indexed 69,000 Claude Code skills (and what I learned doing it)
Adam Lankame · 2026-05-25 · via DEV Community

One month ago I started building an open catalog of Claude Code skills. Yesterday it crossed 69,369 indexed SKILL.md files. This post is the engineering story — what I built, what surprised me, and what's free for anyone to use.

If you've never written a Claude Code skill: it's a Markdown file with YAML frontmatter that gives Anthropic's Claude Code agent specialized behavior. Drop it in ~/.claude/skills/<name>/SKILL.md and Claude can invoke it as a slash command. Think of it like a Vim plugin or a VSCode extension, except the contract is "instructions in English" rather than "code in Lua / TypeScript."

The format is brand-new. The official spec doesn't ship a catalog. The awesome-* lists I could find at the time covered maybe 300 hand-picked entries. Meanwhile, GitHub's code search showed thousands of public repos with SKILL.md files in them. The long tail of the ecosystem was completely invisible. That's the gap I set out to close.

The shape of the problem

Here's what I knew going in:

  1. Discovery was broken. A skill author would push their SKILL.md to GitHub and ... nothing. No directory, no aggregator, no search surface. The only way another developer found it was Twitter, Discord, or stumbling onto the repo.

  2. Quality varied wildly. Some skills were 200-line operator-grade tools with pricing tables, anti-trigger sections, and structured examples. Others were 4-line stubs that read like "TODO: write a skill that does X." Both were indexable, neither was distinguishable from outside.

  3. The format itself was changing fast. The frontmatter spec gained fields monthly — allowed-tools, user-invokable, model, metadata.api_base. Yesterday's "good" SKILL.md could be tomorrow's missing-required-field.

  4. There was no good API surface. If you wanted to build something on top of the skill ecosystem (a tool for evaluating skills, a recommender, an installer), you had to scrape GitHub yourself.

I wanted a catalog that fixed all four. Open data, daily refresh, free API, free dataset. No pay-to-list, no listing fees, no ranking-for-money. The only paid product would be an evaluation layer for end-users (a quality score in the desktop app), never anything skill authors had to opt into. Anti-rent-seeking by construction.

The miner — 24 sources, every night

The catalog is built by a single Python script that runs on a Mac mini in my office at 01:00 local. It crawls 24 public sources looking for SKILL.md files:

Source What it discovers
GitHub code search (filename:SKILL.md) The bulk of the catalog — 101 query variants covering language hints, frontmatter fields, and date-bounded slices to defeat the 1000-result hard cap
GitHub Topics (topic:claude-code-skills) + 31 variants Topic-tagged repos
GitHub Gists Single-file skills posted as gists (most catalogs miss these)
Awesome-list READMEs (32 lists) Anything the existing curators picked
GitLab, Codeberg Skills outside GitHub
HuggingFace Skills uploaded as datasets
Reddit, HackerNews Algolia, Bluesky, Mastodon, dev.to, YouTube, Telegram Mentions in posts/comments — text-blob scan for repo URLs
Wayback Machine CDX API Renamed / deleted repos still discoverable via archive.org
Stargazer graph mining Once we find one good skill repo, mine who starred it — they often have skills too
Author repo enumeration When we admit one of an author's skills, scan their other repos
Topic co-occurrence Topics tagged alongside claude-code-skills get crawled for next run
VSCode + Open VSX marketplaces Some extensions ship with SKILL.md companions
Brave Search API Web-search-anchored discovery
LLM query expansion Claude generates next-week's search queries based on what's been found

Each source returns candidate repo URLs. The miner fetches the SKILL.md, validates the YAML frontmatter, runs admission scoring (more on this below), categorizes by domain (Engineering / Security / Growth / etc. — 10 categories total), tags across ~100 orthogonal dimensions (language, framework, AI provider, cloud, integration type), and writes a static HTML page at /skills/<slug>/.

The miner is bounded: per-source caps prevent any one source from draining the GitHub API budget; every section runs inside a _safe_section() try-block so a single broken endpoint can't kill the run.

A full run takes about 4 hours. New skills appear on the live catalog the same day they're discovered.

Admission — content signals only, no popularity

This is the part I'm most opinionated about. Ranking can't be bought. The moment a paid signal influences who appears in the catalog (or in what order), the value proposition collapses — nobody pays for "objective evaluation" when it isn't objective.

So the catalog admits skills based on a content score derived from the SKILL.md itself:

  • Anti-trigger discipline — does the SKILL.md have a "when NOT to use" or "out of scope" section? That's a +4 per pattern, capped at +16. Strong negative-space marking is the single best signal that the author thought about edge cases.
  • Pricing / quota transparency — does it document costs, rate limits, or expected API spend? +10.
  • Frontmatter depth — beyond name: and description:, how many other fields are present (model:, tags:, version:, license:, allowed-tools:, metadata.*)? Capped at 10 distinct keys to prevent gaming.
  • Length × structure — is the body substantive (>800 chars in description:, multiple code blocks, headings)?
  • Filler-phrase penalty// TODO, Lorem ipsum, generic templated phrases → minus 5.

The score never weighs stars, forks, install counts, GitHub follower count, or any other popularity signal. A skill written by a developer with 0 GitHub followers and a clear anti-trigger section beats a flashy skill by a 50k-follower influencer that's just frontmatter-and-vibes. That's the bar.

For ranking inside the desktop app's Pro tier — a separate evaluation layer — the formula is the same content-only structural score plus frontmatter-completeness, rescaled to [50, 100]. Still no popularity signals.

This costs me about 30% of what an unconstrained "rank by stars" catalog would surface. I'm OK with that trade.

What surprised me

1. The catalog is dominated by a handful of prolific authors. One contributor has 3,446 admitted skills (yes, really). The top 25 authors account for ~30% of the catalog. There's a Pareto distribution underneath the long tail.

2. Sales-category skills score highest on content quality. Counter-intuitive — I expected Engineering or Security to be most polished. Turns out sales-focused skill authors over-index on structure (anti-trigger sections, scope discipline, pricing transparency) because that's their professional habit. Engineering authors more often skip the "when NOT to use" section because they assume it's obvious.

3. Vendor-side adoption is still 0. The catalog has zero skills with author_url pointing at anthropic.com, OpenAI.com, or any other large AI vendor. Every entry is independent. The ecosystem is fully community-driven.

4. The SKILL.md format is leaking sideways. I found skills in repos tagged cline-skills, cursor-rules, aider-skills, windsurf-rules. The format is becoming a portable agent-skill standard, not just a Claude Code thing. The catalog admits these too — they're SKILL.md files, the agent that loads them is the user's choice.

5. The biggest discovery surface isn't GitHub code search. It's the stargazer graph. When a SKILL.md hits a few hundred stars, the people who star it have a 30%+ rate of having their own SKILL.md somewhere in their account. Mining the graph yields skills the code-search queries don't find.

What's free

Everything the catalog produces is open:

  • Public catalog at https://claudskills.com/ — browseable + searchable.
  • Open dataset at github.com/claudskills/catalog-public — daily refresh in 6 formats (JSON, NDJSON, CSV, Parquet, Atom feed, README). CC BY 4.0.
  • HuggingFace mirror at huggingface.co/datasets/claudskills/skills — same data, parquet-native, suitable for LLM training.
  • Public REST API at https://claudskills.com/api/v1/ — read-only, no auth, CORS-open, edge-cached. OpenAPI 3.1 spec covers every endpoint. Paginated /skills, single-skill /skills/<slug>, /categories, /tags, /stats. The catalog API itself is ~300 LOC of Cloudflare Worker code; the heavy lifting is the daily miner.
  • Embeddable skill card at https://claudskills.com/embed/<slug>.js — one-line <script> tag that injects a styled card into any blog post or doc page. The card you'd drop into your own writeup of a favorite skill.
  • Shields.io-style badge at https://claudskills.com/badge/<slug>.svg — for skill authors to drop into their GitHub READMEs.
  • Daily Skill-of-the-Day archive at /sotd/YYYY-MM-DD/ — every UTC day picks one skill via a date-hash that stays consistent across mobile push, social posts, and the web.
  • Per-category, per-tag, per-author, and per-use-case landing pages — about 2,800 hub pages total covering the catalog from every browsing angle.

What I'd change if starting over

A few things I learned the hard way:

  1. Build the public dataset first, the website second. I spent the first two months making the website nice. The dataset would have driven more usage faster — researchers and tool-builders pick up CC BY 4.0 data within days of finding it; consumer-facing UIs take months to build word-of-mouth.

  2. Cloudflare Workers + R2 + Netlify together is more reliable than any one of them. The site has 64,000+ per-skill HTML pages, which would blow Netlify's deploy-prep budget at scale. So per-skill HTML files live in Cloudflare R2 with a Netlify rewrite to serve them from claudskills.com/skills/<slug>/. API + embed + badge endpoints are Cloudflare Workers bound to the same domain. The homepage + static pages are direct from Netlify. Each layer doing what it's best at.

  3. Anti-popularity signals were the hardest decision and the most important one. Every time I evaluate a candidate change to the ranking algorithm, "would skill authors pay to influence this?" is the test. If yes, the change doesn't ship. The discipline pays off when you have a Pro subscription product — it's "pay $9/month for the multi-signal Quality Score in the desktop app," and there's nothing for me to defend about why the score is honest. It's honest by construction.

What's next

The next quarter is about distribution — the catalog exists, now developers need to find it. The roadmap:

  • 25 awesome-list PRs (live next week)
  • A weekly catalog-growth report cross-posted to dev.to / Hashnode / Medium / LinkedIn
  • Embed cards in third-party blog posts (the API is ready; the inbound demand will tell us if the embed surface gets traction)
  • iOS and Android companion apps for discovery (already in App Store review at the time of writing)

If you've written a SKILL.md, it's probably already in the catalog — search for your repo name at claudskills.com. If you haven't, the catalog will pick it up within 24 hours of you pushing to a public GitHub repo. If you want to fast-track it, there's a submit form on the homepage.

If you're a researcher, a tool-builder, or an LLM-pipeline operator who wants to ingest the data: the public dataset refreshes daily, and the API is rate-limit-free for normal use. Build something cool — I'd love to hear about it.

The catalog is at claudskills.com. The dataset is at github.com/claudskills/catalog-public. Comments + questions welcome.


ClaudSkills is an independent community catalog. Claude™ is a trademark of Anthropic PBC; ClaudSkills is not affiliated with, endorsed by, or sponsored by Anthropic.