惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Core Web Vitals from 74 to 91: A Real Tax Practitioner Site Rebuild I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened. Beyond the Loop: Why Monolithic AI Agents Fail and How to Build a Microkernel Architecture The Hidden Tax of AI-Assisted Development (And How I Fixed It) I Ditched Cloud LLMs for Gemma 4 4B: A DevOps Engineer's 48-Hour Reality Check Building a Schema.org @graph That Validates on the First Try The "Lift and Shift" Trap: Why Your Integration Layer Needs More Than Just a Cloud Address All 7 OSI Layers Explained with Real-World Analogies Antigravity 2.0 in one day: the four shells and what each is good for Self-Hosting Google Fonts with size-adjust: Zero CLS Web Font Swap The Multi-Provider LLM Problem: Why “One API” Is Not Enough How I indexed 69,000 Claude Code skills (and what I learned doing it) RememberMe CareGrid: Local Gemma 4 for dementia memory and safety Google Is Killing Gemini CLI on June 18. Here Is What to Do Before Then Do Domínio ao Deploy: Hospedando Arquivos de Deep Links no Cloudflare Pages (Parte 7.1) Running Gemma 4 26B on an Old GTX 1080 with llama.cpp Devlog 1: I tried building an SNES game with the super FX chip Why Gemma 4 Feels Like an Important Moment for AI Developers✨ From Zero and Confused, This Is How I Started Learning to Code I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founder’s Best Friend MyErp Architecture Series - #02 Cellular Architecture: Mapping Biology to Software Systems NodeJS vs Bun vs Go 🌍 RTL Arabic Style UI How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4 📝 Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders 터미널 AI 에이전트 구축 (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma Why I Built a Privacy-First Dev Toolkit GAS Input Tags: Ability Activation Without Hardcoded Bindings AI Legal Document Advisor Supported By Gemm 4 Model Building Convertify in Public Week 10: PDF Cluster + Blog Launch CureNet AI: Decentralized Health Intelligence for India, Powered by Gemma 4 and ABHA Standardization When Open-Weights AI Meets a Broken Healthcare System: Deploying Gemma 4 in Rural India V.A.L.I.D. Google I/O 2026: The Year Google Stopped Building AI Assistants and Started Shipping AI Engineers Bondmap: AI-Powered Relationship Network That Maps How You're Connected to Everyone Using Gemma 4 Gemma 4 challenge inspired me to build my first app! 96. LoRA: Fine-Tune a Billion-Parameter Model on a Laptop From a Student Who Used CircuitVerse to a GSoC Contributor — My Community Bonding Story How Bf-Tree Keeps Mini-Pages Small, Hot, and Cheap to Evict I asked Claude to explain the chip war and ended up understanding modern geopolitics differently Stop Manually Checking for Server Updates: Automate With Email Notifications Nostalgia Meets Cybersecurity: Spotting Modern Scams in a Retro OS Simulator - Forward or Fraud CRACKING CODING INTERVIEW From Python to Production Pipeline :A Practical guide to Apache Airflow Antigravity 2.0: Google Just Changed What It Means to Be an Engineer I Built a Free Sticker Maker Because Every Other One Hid the Export How I bypassed Blazor WebAssembly's Virtual DOM using raw WASM pointers Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable The Zero-Budget Memory Setup Behind My AI Agent Workflow No database. No framework. Just files, startup order, correction logs, and discipline. I Built an AI Second Brain with Gemma 4 The Most Exciting Google I/O 2026 Announcement for Me: HTML-in-Canvas CrisisLens: Compressing Disaster Scenes into 200-Byte Emergency Payloads with Gemma 4 I'm 15 and I built a todo app with Telegram Stars payments — only legal way for me to monetize before turning 18 Crypto Branding After the Token Launch Building an on-chain alerts bot in Python without any blockchain library FinePrint — An AI Pocket Lawyer That Decodes Predatory Contracts Using Gemma 4 How to Connect OpenAI with Supabase in 10 Minutes for a Lightning-Fast AI MVP One AI Gateway for AWS Bedrock, Google Vertex AI, Gemini, and Anthropic Reading Log #9 — Aoashi The Tacit Dimension Thinking, Fast and Slow Web3 Onboarding Is Not a Wallet Problem. It Is a Trust Problem. FHE Prompt Privacy: The Metadata Leak Your Demo Still Has Software Might Be Becoming Agent-Aware: What if software starts coordinating itself? The Silent Killers of Go Concurrency: Mutexes, Semaphores, and Goroutine Leaks Lynx framework first look Building Aries AI: A Solo-Built AI Abacus Tutor on OpenAI + Supabase + Render + Razorpay I built a paid Telegram bot. Here's what Telegram Stars actually pay. Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions Improving AI resume matching with prompt iteration — 7.37 to 8.37/10 7 things you can do with Rogue Studio that no other AI IDE will let you do Why I Think WordPress Still Matters Reading Log #7 — Aoashi Guns, Germs, and Steel Distinction Open Models and the Sub-Saharan Region What 12 Months of AI-Generated Pull Requests Taught My Engineering Team
I built an AI PR-triage agent in 30 lines of Markdown
Arthur · 2026-05-25 · via DEV Community

This is a submission for the Google I/O Writing Challenge

A recipe for the AI PR-triage agent I built after Google I/O 2026: three Markdown skill files, one Python runner, one real public GitHub repo, about twelve cents per run.

1. What I built

At Google I/O 2026, Logan from the Gemini API team walked through an AGENTS.md file for an AI talk-radio agent and dropped a line on stage that stuck with me: "the hottest new programming language is Markdown." He had written no orchestration logic, just skills and tools in Markdown files, and the agent shipped a finished podcast episode from a single API call.

I took that seriously. The next day, I spent a few hours building an AI pull-request triage agent on the public Gemini API. Three Markdown skill files, one small Python runner, one real public GitHub repo as the target. The agent scanned sixteen open pull requests, categorized each by risk, drafted a one-line summary, and produced a grouped report. Two consecutive runs, identical category distributions, under two minutes each, about twelve cents per run.

This article is the recipe. Working code, real cost, an excerpt of the actual triage report the agent wrote, and enough scaffolding for you to try it tonight against any public repo you care about.

2. What "skills as Markdown" actually means

A skill is a single .md file with four pieces:

  1. A name and a one-sentence description of when to invoke it.
  2. A numbered procedure.
  3. Constraints (what the skill must not do).
  4. Composition notes (which skill, if any, the agent should call next).

The agent loads skills when they are relevant to the user request, and it calls the tools they reference. There is no orchestration logic inside the skill file. The skill is the spec.

This is meaningfully different from cramming everything into a system prompt. Skills compose: skill A can hand off to skill B without the runner reshuffling state. Skills version independently from the runner, so you can iterate prose without touching Python. Skills carry per-tool constraints, which the model respects because the constraint is attached to the procedure rather than buried in a long preamble.

3. The three skills

I wrote three files. Together they are 101 lines of Markdown for the entire agent definition. Here is the first one verbatim, the entry point for the agent:

# Skill: scan_open_prs

Use this skill when asked to list, scan, or audit open pull requests on a GitHub
repository.

## Procedure

1. Call the `github_list_prs` tool with `state="open"` and the requested `limit`
   (default 25, maximum 50). The tool requires a `repo` argument in the form
   `owner/name` (for example, `cli/cli`).
2. For each returned PR, keep these fields verbatim: `number`, `title`, `user`,
   `additions`, `deletions`, `changed_files`, `draft`, `created_at`, and the
   first 200 characters of `body`.
3. Return the result as a JSON array. Do not paraphrase the title or body.

## Constraints

- Do not fetch full diffs in this skill. That is `categorize_by_risk`'s job.
- Skip draft PRs unless the user has explicitly asked for them.
- If the tool returns zero PRs, report that plainly and stop. Do not invent PRs.
- If the tool returns an error, surface the error message verbatim and stop.

## Composition

After running this skill, the agent should call `categorize_by_risk` once with
the JSON array as input.

Enter fullscreen mode Exit fullscreen mode

Twenty-six lines. That is the entire entry-point skill. Notice how much of it is constraints: "do not paraphrase," "do not invent PRs," "skip drafts unless asked." Most of the work in writing a good skill goes into anticipating the model's bad habits and writing them out of the procedure.

The second skill, categorize_by_risk.md, is 41 lines. It calls github_get_pr_files for each PR and applies first-match-wins heuristics: breaking if the PR touches dependency files, security if it touches auth or crypto paths, docs if it only changes docs, fix if the title contains certain keywords, refactor if additions roughly equal deletions, feature otherwise. Each PR gets a category, a confidence, and a one-sentence reason.

The third, draft_summary.md, is 34 lines. It produces an action-verb-first one-line summary for each PR and emits the final report grouped by category, security first.

One short note on composition. When skill A says "now call skill B," the agent treats the boundary as a turn break. Skill B runs in a fresh turn with the JSON output of skill A as its input. This is multi-turn composition, not in-call composition, and it shapes how you structure your skills: each one is a complete unit of work with a clean input and output, not a function in a chain.

4. The runner

The runner is roughly 70 lines of Python that loads the skills, registers two function tools (github_list_prs and github_get_pr_files), and drives a multi-turn loop until the model says it is finished.

Google's official Managed Agents API is early-access only at the moment, but the same shape (one call, attached skills, attached tools) runs on the public Gemini API today, with the same skill files.

The shape, abbreviated:

from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
skills = load_skills("skills/")           # reads the three .md files
tools = [github_list_prs_decl, github_get_pr_files_decl]

contents = [user_turn(f"Triage open PRs on {repo}. Skills:\n\n{skills}")]
while True:
    resp = client.models.generate_content(
        model="gemini-3.5-flash",
        contents=contents,
        config=types.GenerateContentConfig(tools=tools),
    )
    calls = extract_tool_calls(resp)
    if not calls:
        break
    contents.append(resp.candidates[0].content)
    contents.append(run_tools(calls))     # executes locally, returns FunctionResponse
print(resp.text)

Enter fullscreen mode Exit fullscreen mode

That is the entire control structure. One client, two function tools, one loop, three Markdown files attached as part of the user turn. The loop pays for everything the agent learns about the repo: which PRs exist, which files they touch, what the titles look like. No graph framework, no orchestrator, no agent class hierarchy.

5. The runs

I pointed the agent at cli/cli, the GitHub CLI repository, which had sixteen open non-draft pull requests at the time. I ran it twice from a cold start.

The numbers:

  • 19 tool calls per run. Three github_list_prs calls during exploration (the agent verified pagination), then sixteen github_get_pr_files calls, one per PR.
  • Elapsed time. Run 1: 112 seconds. Run 2: 84 seconds. The second run is faster because the model commits to the plan earlier and skips the exploration calls partway through.
  • Cost. About $0.12 per run in Gemini 3.5 Flash spend.
  • Stability. Both runs produced identical category distributions across the sixteen PRs. No hallucinated PR numbers, no missed PRs.

Here is what the agent wrote for the top of the report:

## security

- #13500: Refactor string splitting in loops to use the more efficient SplitSeq function. [security]
- #13492: Add gh-cli-site-deployer App to replace SITE_DEPLOY_PAT in release workflows. [security]
- #13403: Refactor GitHub database IDs to use 64-bit integers across commands and API clients. [security]
- #13250: Add categorized target host categorization (github.com vs tenant) to telemetry data. [security]

## feature

- #13471: Add --all flag to gh skill install to support installing all discovered skills. [feature]

Enter fullscreen mode Exit fullscreen mode

The category-by-category reasoning was crisp. Security PRs were grouped at the top, exactly as draft_summary.md had instructed. Every summary led with a verb. Confidence scores matched the heuristics in categorize_by_risk.md. The skill files did the work.

At nightly cadence on a repo this size, the annual cost lands somewhere around $40 to $50. Cheap, especially compared to the developer-hours of triage it replaces.

6. Three things worth knowing

A few practical notes from the build.

Composition is multi-turn, not in-call. If skill A invokes skill B, plan for a turn boundary between them. The model's working memory between turns is whatever you put back into contents, so emit clean JSON at skill boundaries rather than relying on natural-language handoff.

Token spend is non-deterministic. The agent pays to learn the repo, and how much it pays depends on what it finds. On a 1,000-PR monorepo, set an explicit tool-call budget in the runner and have the loop break when it is exceeded. Otherwise a single run can quietly become expensive.

For audited or strictly deterministic pipelines, an orchestrator graph still wins. Markdown skills are the right tool for exploratory work, summarization, triage, and drafting. If your pipeline has compliance hooks, retry semantics, or a fan-out fan-in shape, reach for a graph framework. The two patterns coexist.

7. Try it tonight

The whole recipe:

  1. pip install google-genai in a virtualenv. Set GEMINI_API_KEY from Google AI Studio and GH_TOKEN from a read-only GitHub personal access token.
  2. Save three skill files in skills/. Use scan_open_prs.md above as the template; write categorize_by_risk.md and draft_summary.md in the same shape (name, procedure, constraints, composition).
  3. Write the runner: one genai.Client, two function tool declarations (github_list_prs, github_get_pr_files), one multi-turn loop driving until the model emits no more tool calls.
  4. Point it at a public GitHub repo. Start with something small. cli/cli is a good first target because the PR titles are descriptive.
  5. Read the JSON trace the loop produces. Tweak the skill prose where the agent went sideways. Run again. The whole iteration cycle is about a minute.

Two evenings of work, including the runs, and the agent is paying for itself the first time you let it sweep a backlog before standup.

8. Closing

I am optimistic about this pattern. Markdown skills make agent definitions reviewable in a pull request, runnable from any IDE, portable across runners. The skill file is a primary artifact, not a string buried inside a Python class. Anyone on the team can edit it. Anyone reading the repo can see what the agent will do.

Which workflows in your stack feel like a natural fit for Markdown skills, and which still need a graph?