惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Apple Machine Learning Research
Apple Machine Learning Research
Y
Y Combinator Blog
量子位
The Register - Security
The Register - Security
雷峰网
雷峰网
人人都是产品经理
人人都是产品经理
PCI Perspectives
PCI Perspectives
S
Secure Thoughts
V
V2EX - 技术
大猫的无限游戏
大猫的无限游戏
博客园 - Franky
C
Comments on: Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Google DeepMind News
Google DeepMind News
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
小众软件
小众软件
博客园_首页
S
Schneier on Security
S
Security @ Cisco Blogs
AWS News Blog
AWS News Blog
月光博客
月光博客
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
CERT Recently Published Vulnerability Notes
NISL@THU
NISL@THU
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
V
V2EX
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Recent Commits to openclaw:main
Recent Commits to openclaw:main
C
Cisco Blogs
Project Zero
Project Zero
博客园 - 叶小钗
Cisco Talos Blog
Cisco Talos Blog
博客园 - 聂微东
罗磊的独立博客
N
News | PayPal Newsroom
酷 壳 – CoolShell
酷 壳 – CoolShell
李成银的技术随笔
V
Visual Studio Blog
The Cloudflare Blog
IT之家
IT之家
M
Microsoft Research Blog - Microsoft Research
J
Java Code Geeks
博客园 - 三生石上(FineUI控件)
N
Netflix TechBlog - Medium
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
宝玉的分享
宝玉的分享
U
Unit 42
Hugging Face - Blog
Hugging Face - Blog

DEV Community

From Building WordPress Websites to Node.js APIs: My Honest Full Stack Journey XiHan Snore Coach: Privacy-First On-Device MedTech Guardian powered by Gemma 4 Mobile Why AI Coding Agents Hallucinate and How to Fix It Google I/O 2026 Wasn't About One More Model. It Was About the Agent Stack. How I built 100+ crypto calculators in 6 languages on Astro The Dawn of Local Multi-Agent Architectures: Why Gemma 4 Changes Everything for Cloud Developers # I Told My AI to Simulate a Planet for 10,000 Years. It Built the Whole Thing Itself. 18/30 Days System Design Questions! From Hackathon Chaos to Clean CLI: Reviving My Daily Routine Analyser with GitHub Copilot Building a Home Lab with Proxmox and Terraform (for Kubernetes) PolicyAware vs Guardrails vs AI Gateways vs Model Routers: The Comparison Every AI Engineer Needs to Read Partner: An AI That Does Research While You Sleep Rugby Fundamentals as Software Concepts - Mapping the Pitch to your Code Base I Let Claude Code Run Unsupervised for 24 Hours. Here's What Happened. Why Zed Is Replacing VS Code in My AI-Augmented Workflow Build a scroll-driven WebGL hero in 30 lines Karpathy's LLM Wiki? No Code with Claude or Github Copilot! Why Platform Governance and Transparency Matter for Developers and Freelancers I built a Flutter CLI that generates Clean Architecture in seconds Using an LLM to automate a task that used to take hours by hand CyberArena – Interactive Cyber Security Simulation & Threat Analysis Platform Tile Extractor Mathematical Functions in CSS: clamp, min, max and How They Simplify Responsiveness Polyglot Persistence in Microservices: Let the Domain Choose the Database 190 Countries, Zero API Calls: Shipping Static Data in a Chrome Extension Your AI Writes Code Fast. Here’s How to Check It Before Shipping qwen2.5-coder is too slow for Claude Code on a Mac. Here's the fix. Building Automated Text-to-Video Pipelines with AI Can Gemini Become an Offline AI Tutor? Lessons from Building Educational AI OPRIX : From a simple messaging web app to a well structured and enhanced UI messaging web app Why React + TypeScript Nullability Slowly Becomes Exhausting Why AI Agents Need a Project Layer - Part 1 Stop Hand-Editing MCP Configs: A Zero-Dependency Go CLI What I Learned Working With Microsoft, SQUAD(GTCO), and Different Tech Communities 🧠 Hermes Agent Assistant — A Modular AI Agent System with Planner, Executor & Memory Spring Boot Auto-Configuration Source Code: Nail This Interview Question The Ultimate Guide to Free AI API Keys: 6 Platforms You Need to Know Why 91% of AI Agents Fail in Production (And What the 9% Do Differently) TryHackMe | Battery | WALKTHROUGH Stop Guessing Your Regex — Test It Live in the Browser I Built FreelancEye, an Open-Source Mobile PWA for Finding Clients Beyond the Hype: My Production Playbook for Docker Swarm Top AI App Builder Platforms with Integrated Backend, Hosting & Database ECS vs EKS in 2026: An Honest Comparison from Someone Who Has Run Both in Production Hardening Your Node.js App Against Supply Chain & Remote Code Execution Attacks linux commands A Practical GEO Case: How an AI System Started Recommending Our Blog Your AI Agent Works 24/7 and Earns $0. I Built the Fix. Your AI Trading Agent Will Lose All Your Money — Here's How To Stop It Google I/O 2026: What Happens When Everything Connects? Why AI writes software but doesn’t build a good product Beyond the Hype: How Google I/O 2026 Secretly Democratized Production-Ready AI Agents with Managed Sandboxes. The Killer Assumption Test: How to Spot Doomed Product Decisions Before You Ship Stop Describing Your Bugs — Just Screenshot Them # I Built an AI Website Builder and Here's What Actually Happened Cooking an AI Campaign in 5 Minutes with Google Cloud AI APIs Your PM Retrospectives Are Lying to You How I Built a Free, Self-Hosted Pipeline That Auto-Generates Faceless YouTube Shorts TypeScript 54 to 58: The Features That Actually Matter in 2026 How to Tailor Your CV to Any Job Posting in 2026 The 7-day SaaS MVP loop: ship fast, then validate with people who actually show up 95. Fine-Tuning LLMs: Make a General Model Do Your Specific Job What Is a Frontend Developer Roadmap and Why You Need One Google shipped three Gemini "Flash" models. Picking the wrong one could 6 your AI bill Building an MCP server so Claude can query my SaaS analytics directly Google I/O 2026 and the Rise of the AI Ecosystem Your Docker Builds Are Slow Because You're Doing It Wrong (And I Built a Tool to Prove It) How do you verify GitHub contributions without trusting self-reported skills? CV vs Resume: What's the Difference and Which Do You Need? student Devs: Build AI Agents & Compete for $55K in Prizes 🚀 How to Write a Cover Letter That Actually Gets You Interviews Battle-Tested: What Getting Hacked Taught Me About Web & Cyber Security Unda folders za kuandika code >> mkdir src >> cd src >> mkdir controllers database routes services utils >> cd .. Directory: C:\Users\mwaki\microfinance-system Mode LastWriteTime Length Name Code Coverage .NET AI slop debt" is technical debt on fast forward. Nobody's ready. Multi-Head Latent Attention (MLA) Memoria - A Local AI Reading Companion Powered by Gemma 4 Stop Trusting Your Accuracy Score: A Practical Guide to Evaluating Logistic Regression Models Serious Question: Is the Developer Job Actually in Risk Due to AI? published: true tags: #discuss #career #ai #help rav2d: We ported an AV2 video decoder from C to Rust — here's why Your New Domain's First Week of GA4 Is a Lie: 4 Days of Raw Data from a Launch Gemma Guide - Real-Time Spatial Awareness for Blind Users From YAML to AI Agents: Building Smarter DevOps Pipelines with MCP A Field Guide to Human–AI Relations (For the Newly Bewildered Mortal) The AI Agent That Learns While It Works — A Complete Guide to Hermes Agent Inviting collaborators to work on ArchScope ArchScope is an interactive web-based tool that lets you design, visualize, and test system architectures with real-time performance simulations. Github - ArchScope is an interactive web-based tool that lets you Gemma 4: Google's Open-Weight AI Is a Game Changer for Developers Confessions of a Git Beginner: Why the Terminal Stopped Scaring Me Docker 容器化实战:从零到生产部署 🚀 I Built a Full Stack Miro Clone with Real-Time Collaboration using Next.js Building an African Economic Data Pipeline with Python, DuckDB & World Bank API llms.txt vs robots.txt vs ai.txt: The Developer's Cheat Sheet Intigriti Challenge 0526 Writeup Business Logic Flaws: How Attackers Skip Steps in Your App to Get What They Should Never Have Why Vibe Coders Need Boilerplates to Save Time, Tokens, and Build More Secure SaaS Projects Idle Cloud Cost Is the New Egress Cost Quark's Outlines: Python Traceback Objects Ghost in the Stack (Part 1): Why uninitialized variables remember old data Building a High-Performance Local Chess Assistant Extension with WebAssembly Stockfish and Manifest V3 Breaking the Trade-off Between Self-Custody and Intelligent Automation on the Stellar Network
mcp-probe v1.4.0: Contract assertions for production MCP servers
yongrean · 2026-05-23 · via DEV Community

MCP servers are starting to look like infrastructure.

That means the old readiness question is no longer enough:

Does the process start?

Even this is not enough:

Does tools/list return a clean schema?

A server can pass both checks and still fail every real agent loop because auth handoff, scopes, downstream permissions, environment setup, or data boundaries are broken.

So I shipped mcp-probe v1.4.0 with contract assertions for production MCP servers.

The problem: discovery is not readiness

A typical MCP smoke test looks like this:

  1. Start the server
  2. Run initialize
  3. Run tools/list
  4. Check that schemas exist

That catches broken startup and malformed tools.

But it misses the failures that matter in production:

  • The tool advertises correctly, but every call returns 401
  • OAuth requires a browser redirect the agent cannot trigger
  • The DB role is not actually read-only
  • Write attempts leak raw SQL errors or stack traces
  • Results omit metadata agents need to reason safely
  • Tenant or project scope is not preserved
  • Broad exports or admin actions are reachable
  • Error codes are unstable, so agents cannot recover

In other words: the server starts, but the contract is broken.

v1.4.0: sidecar contract assertions

mcp-probe already supported sidecar inputs via .mcp-probe.json so teams could run real tools/call checks instead of relying on schema-minimum dummy inputs.

v1.4.0 extends that sidecar with assertions.

Example for a database-backed MCP server:

{
  "tools": {
    "execute_sql": {
      "input": {
        "project_id": "YOUR_PROJECT_ID",
        "query": "select 1 as health_check"
      },
      "expect": {
        "status": "pass",
        "requiredFields": ["rowCount", "limit", "source", "freshness"],
        "maxRows": 100
      }
    },
    "execute_sql_write_denied": {
      "input": {
        "project_id": "YOUR_PROJECT_ID",
        "query": "delete from users where id = 1"
      },
      "expect": {
        "status": "fail",
        "errorCode": "WRITE_NOT_ALLOWED",
        "notContains": ["DATABASE_URL", "password", "stack"]
      }
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Now CI can validate the contract an agent actually depends on.

What assertions are supported?

expect.status

Declare whether a call should pass, fail, or warn.

This is important for negative probes. A write attempt against a read-only DB role should fail. In that case, failure is success.

{
  "expect": {
    "status": "fail"
  }
}

Enter fullscreen mode Exit fullscreen mode

expect.requiredFields

Validate that result metadata exists.

For database tools, an agent often needs more than rows. It needs context:

  • rowCount
  • limit
  • source
  • freshness
{
  "expect": {
    "requiredFields": ["rowCount", "limit", "source", "freshness"]
  }
}

Enter fullscreen mode Exit fullscreen mode

expect.maxRows

Catch broad exports or missing limits.

{
  "expect": {
    "maxRows": 100
  }
}

Enter fullscreen mode Exit fullscreen mode

mcp-probe looks for common result shapes such as rowCount, rowsReturned, rows, data, items, and records.

expect.errorCode

Require stable structured error codes.

{
  "expect": {
    "status": "fail",
    "errorCode": "WRITE_NOT_ALLOWED"
  }
}

Enter fullscreen mode Exit fullscreen mode

This matters because agents can only recover if errors are predictable.

expect.contains and expect.notContains

Check for expected output and leaked internals.

{
  "expect": {
    "notContains": ["DATABASE_URL", "password", "stack"]
  }
}

Enter fullscreen mode Exit fullscreen mode

This catches errors that expose raw internals.

expect.not_error_code

Treat known auth/permission status codes as warnings instead of hard failures.

{
  "expect": {
    "not_error_code": [401, 403]
  }
}

Enter fullscreen mode Exit fullscreen mode

This keeps OAuth handoff failures visible without confusing them with transport or runtime crashes.

Output example

When assertions pass:

Tool Call Dry-run
  ✓ db_query [sidecar] 1ms
    ✓ status: Tool status matched expected pass
    ✓ requiredFields.rowCount: Found required field "rowCount"
    ✓ requiredFields.limit: Found required field "limit"
    ✓ requiredFields.source: Found required field "source"
    ✓ requiredFields.freshness: Found required field "freshness"
    ✓ maxRows: Row count 1 is within maxRows 100

  ✓ db_write [sidecar] 0ms
    ✓ status: Tool status matched expected fail
    ✓ errorCode: Found expected error code WRITE_NOT_ALLOWED
    ✓ notContains.DATABASE_URL: Output does not contain "DATABASE_URL"
    ✓ notContains.password: Output does not contain "password"
    ✓ notContains.stack: Output does not contain "stack"

Enter fullscreen mode Exit fullscreen mode

If a contract assertion fails, mcp-probe reports:

CONTRACT_ASSERTION_FAILED

Enter fullscreen mode Exit fullscreen mode

and includes per-assertion details in terminal output, JSON output, and GitHub Actions summaries.

Quick start

npx @k08200/mcp-probe@latest init \
  --target @your-org/your-mcp-server \
  --discover \
  --github-actions

Enter fullscreen mode Exit fullscreen mode

Then edit .mcp-probe.json with real read-only probes and run:

npx @k08200/mcp-probe@latest --config mcp-probe.config.json --github-summary

Enter fullscreen mode Exit fullscreen mode

Why this matters

MCP CI should test the contract an agent will actually depend on, not just whether the server process starts.

For database-backed MCP servers, that means validating things like:

  • read-only role behavior
  • denied writes
  • stable error codes
  • row limits
  • tenant or project scope
  • result metadata
  • no leaked internals

mcp-probe should not know every server's semantics. But it can give teams a small, declarative way to encode the production contract their agents rely on.

That is the goal of v1.4.0.