惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

DEV Community

I built Voice2Sub: a local AI subtitle generator for video and audio The OCR Rabbit Hole Built a 100k-Document RAG System by Hand. Hermes Read the Architecture in 47 Seconds. I tried monetizing my MCP server with x402 — production needs more than npm install Understanding Tracking Dimensions in Accounting Integrations I Ran My Local, NOT AI, AI Code Auditor on Its Own Source Code Stop Being Nice, Start Being Right": The Day My User Reconfigured My Reward Function Building a Database Performance Testing Tool With AI: The Honest Breakdown Hot To Run LLMs Locally Research blockchain with post-quantum Dilithium and custom zk-STARKs from scratch AI agents do not just need tool access. They need execution control. The CTO’s Blueprint for Governing Multi-Agent AI Systems in the Enterprise I audited our CMS and 86% of our articles were invisible. A Sanity gotcha. Upselling Explained Industry-Specific Tactics for EC Owners 2026 I Keep Hermes Agent's Self-Improvement OFF For the First 14 Days — Here's What Happens When I Don't I Built the Hermes + Claude Code Dual-Stack: Orchestrator Meets Coder — Here's the Full Architecture Stop Using .iterrows(). Here's What Actually Fast Looks Like I Built a SaaS to Stop the Awkward "Hey, Did You Get My Invoice?" Conversation I Renamed a Hot Postgres Table Without Dropping a Request How to Build a Self-Hosted AI Gateway With LiteLLM and Open WebUI What is a Webhook? A Complete Guide for Beginners Headless BI: How a Universal Semantic Layer Replaces Tool-Specific Models Beyond Translation: A Developer's Guide to App Localization (i18n & l10n) Aegis: Designing an Offline Ambient Co-Working Companion for High-Burnout Medical and STEM Grinds Local LLM Code Completion Showdown: Zed AI vs Continue vs Cursor (Honest 2026 Review) The Agentic Payment Protocol Wars Your No-Code AI Agent Has a Memory Problem The Agentic Payment Protocol Wars How to Bypass LinkedIn Commercial Use Limit in 2026 (Without Paying $150/mo) We built a statechart hosting platform where two actors in the same state can migrate to different versions — here's why that matters Playwright vs TWD: A Frontend Developer's Honest Comparison Claude Code's skillListingBudgetFraction: The Undocumented Setting Silently Killing Half Your Skills O GitHub pode mudar sua carreira mais do que você imagina Just redesigned and launched my developer portfolio 🚀 Would genuinely love some honest feedback from the dev community 👨‍💻 Data Virtualization and the Semantic Layer: Query Without Copying Launching opub: donated compute for open-source maintainers Four iteration rounds on a security scanner I run, all of them visible. Here is what the loop actually looks like. Why Good Abstractions Make Debugging Harder Found a Coordinated Inauthentic Network on GitHub: 24 Accounts, Fabricated History, and a Generator That Left Its PID in Three READMEs Cursor Just Released Composer 2.5. Here's What Actually Changed for AI Coding Agents. What Wrong Docs Cost Test Automation Teams Export Your DeepSeek Chats to Word, PDF, Google Docs, Markdown & Notion in One Click When the Docs Lie OpenShift Observability: Built-in vs. Bring-Your-Own If your AI initiative is pending for 6 months, the bottleneck is probably not technology Hermes Agent Under the Hood: The Open-Source Runtime for Autonomous AI Systems Expert Systems -The AI That Existed Before AI Was Cool AI-generated accessibility, an update — frontier models still fail, but skills change the game My HTML Learning Journey 🚀 The Day PayPal Failed and the Rust Rewrite Saved the Product Launch Google Sheets CRM: 4 Ways I've Actually Done It (with Apps Script Code) BrontoScope: AI-Powered Error Investigations The job of an AI engineer inside a 40-person company is not what most CEOs think it is Building a Clinical Speech-Therapy App With a Real SLP: 4 Lessons From PhoenixSteps 7 overlooked .Net features How Stripe Took 48 Hours and 3 API Calls to Break My Freelance Income Stream in Lagos Pretty normal Both Camps in the 'Left Behind' Argument Are Right About Each Other Flutter MCP Toolkit v3 Google Just Shipped Gemini 3.5 Flash. Here's What Developers Actually Need to Know. 🔐 Working with Private Symfony Recipes Rate limiting in web apps: what to protect before picking a library Rate limiting en aplicaciones web: qué proteger antes de elegir una librería What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg What It Really Takes to Become a Senior Software Engineer Microservices Were Never About Technology JS Crime Scene: The Misleading Array Project-as-code for a Directus v9 backend When the API literally burned your database after a typo COOKIES DPRK Hacking Trends 2026: AI‑Powered Supply Chain and Developer Environment Attacks Phone control for AI coding sessions is not a tiny terminal PayPal and Crypto Are Not Equals: How I Built a Gumroad Alternative for Restricted Countries Exploring Tech as a Content Writer I Raised Gemma 4's Token Cap. The Dense Model Stopped Refusing. React Server Components Don't Make Your App Fast by Default Multi-Stage Builds for a Next.js App — Reduce Image Size by 70% I Built a Chrome Extension That Teaches Vocabulary While You Browse Why I Walked Back from Next.js and RSC to a Plain SPA and a Separate Backend NeuralPocket: Private On-Device AI with Gemma 4 — Android & Web Github Speckit: Revolucionando o Desenvolvimento com SDD Cloud Cost Elasticity I Built a Payment System for Bangladesh—Heres Why Stripe Failed Us Polyglot Persistence in Microservices: Choosing the Right Database for Each Service Centralized Authentication for a Multi-Brand Laravel Ecosystem How I made a perfect recording button. Simple yet complex thing. Mumbli – my personal Wispr Flow Getting Paid Should Not Be a Geopolitical Nightmare: My NOWPayments Integration Story Four Layers of Validation in Kubernetes with Claude Code Prompt Flow — a visual side project for flow design, trace, and integration steps (looking for feedback) AI Citation Registry: Temporal Gaps in Government Publishing Cycles ShowDev: I built a 100% local, zero-upload PDF editor using WebAssembly JavaC Written by an AI Pipeline, Verified by Three Models. Is It Slop? Part1 Vulkan: Drawing Triangle 1 Why I Stopped Using useEffect to Sync State — and What I Use Instead Por qué dejé de usar useEffect para sincronizar estado y qué uso ahora Migrating a Long-Running WordPress Site to Payload CMS (And All The Chaos That Came With It) Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans Azure DevOps Structure Explained: Organizations, Projects, and Repos Without the Mess
Agent Surface Map: Gemma 4 review before you install an MCP
signalscout · 2026-05-21 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4.

So the thing I built is pretty simple:

Before your coding agent installs a new MCP, ask Gemma what it is about to trust.

What I Built

Agent Surface Map is a pre-install review for MCP servers and agent tools. It does not try to prove a repo is safe. It answers a more practical question:

Should this be added carefully, sandboxed first, or not added?

There are already MCP scanners. That is good. I wanted the missing workflow layer: before a coding agent installs a new MCP, have Gemma turn the surface map into install constraints, then validate the final config before it gets written.

The loop: scan repo -> Gemma decides install posture -> agent validates final config before writing it.

Demo

Live demo

Click Try demo MCP scan on the homepage. It scans this tiny public fixture:

Demo MCP fixture

The live scan returns parsed MCP servers, a risk score, install constraints, and review_source: "gemma" when the Gemma route is available. If the provider rate-limits, the app falls back to the deterministic local review and labels that honestly. There is also a saved verified Gemma review so the model path is still visible when the provider is busy.

Code

GitHub repo

There is also an MCP server in the repo:

Run the MCP server with python3 mcp_server.py.

That means a coding agent can call scan_github_tool(url) before it edits local MCP config, then call validate_install_plan(report, proposed_config) before it writes the final config.

That is the real workflow: "hey agent, before you install this new tool, ask Agent Surface Map what constraints to follow, then check your final config against them."

How I Used Gemma 4

I used Gemma 4 31B Dense for the final install review.

The scanner looks at install-facing files: mcp.json, package files, repo instructions, Docker files, env examples, and similar config. It does not execute the repo.

It pulls out:

  • MCP server names, commands, args, and env key names
  • shell/process surfaces
  • browser automation and profile reuse
  • filesystem mounts
  • cloud/database/token references
  • prompt-injection-ish repo instructions
  • install scripts and local listener hints

Then it redacts secret-looking values and sends the compact surface map to Gemma 4.

Gemma is the judgment layer. The deterministic scanner finds the evidence; Gemma turns it into a practical install decision and agent constraints.

I chose the 31B Dense model because this is not just classification. The model has to reason over messy developer context: browser profile reuse plus filesystem mounts plus token names is more serious than any one signal alone.

After Gemma returns the posture, the MCP workflow can check the final proposed config with validate_install_plan. That catches stuff like global install after sandbox_first, broad local paths, Docker socket exposure, and secret values pasted directly into config.

Why this felt worth building

Coding agents changed the shape of local risk. A repo is not just code anymore. It can ship instructions for your agent, MCP config, package scripts, browser access, write paths, and credential names.

That is basically a tiny operating surface on your laptop.

So this is the safety pause before the agent gets more power. Not a malware sandbox. Not a full audit. Just a fast answer to: should this be added carefully, sandboxed first, or rejected?

And yeah, you can paste a config into ChatGPT and ask for advice. The difference here is that the review is wired into the install path. The agent can scan, get a structured posture, draft the config, and then check that exact config before it writes anything.

Safety choices

I kept the evaluator boring on purpose:

  • no repo code execution
  • shallow/no-submodule GitHub retrieval
  • secret value redaction
  • local path refusal for root/profile/credential dirs
  • bounded MCP responses
  • public scan rate limits
  • Gemma review rate limits
  • best-effort demo throttles for scans and Gemma reviews

The hosted demo uses a guarded Gemma 4 path through OpenRouter. I also saved proof artifacts for the MCP workflow and live Gemma review in docs/proofs/.

Verification

  • python3 -m unittest discover -s tests -v
  • python3 -m py_compile surface_map.py server.py api/scan.py mcp_server.py scripts/mcp_workflow_smoke.py
  • node --check public/app.js
  • python3 scripts/mcp_workflow_smoke.py

Current proof:

  • live demo deployed
  • Gemma route configured
  • public demo MCP fixture works
  • MCP stdio workflow works
  • final install-plan validation blocks unsafe config
  • scanner tests pass

I think the interesting part is not the regex scanner. It is the handoff. Deterministic code collects boring evidence, Gemma turns it into install constraints a developer or coding agent can actually use, and the final plan gets checked before anything touches the shell.