惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Proofpoint News Feed
L
Lohrmann on Cybersecurity
S
Secure Thoughts
Attack and Defense Labs
Attack and Defense Labs
人人都是产品经理
人人都是产品经理
Stack Overflow Blog
Stack Overflow Blog
W
WeLiveSecurity
O
OpenAI News
SecWiki News
SecWiki News
博客园 - Franky
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
Microsoft Security Blog
Microsoft Security Blog
aimingoo的专栏
aimingoo的专栏
Security Latest
Security Latest
H
Hacker News: Front Page
Google Online Security Blog
Google Online Security Blog
P
Privacy & Cybersecurity Law Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Darknet – Hacking Tools, Hacker News & Cyber Security
月光博客
月光博客
李成银的技术随笔
Spread Privacy
Spread Privacy
F
Full Disclosure
F
Fortinet All Blogs
T
The Exploit Database - CXSecurity.com
Vercel News
Vercel News
AWS News Blog
AWS News Blog
WordPress大学
WordPress大学
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
V
Visual Studio Blog
J
Java Code Geeks
博客园 - 三生石上(FineUI控件)
G
Google Developers Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Engineering at Meta
Engineering at Meta
Last Week in AI
Last Week in AI
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
T
True Tiger Recordings
N
News and Events Feed by Topic
酷 壳 – CoolShell
酷 壳 – CoolShell
Cisco Talos Blog
Cisco Talos Blog
N
News | PayPal Newsroom
S
SegmentFault 最新的问题
Jina AI
Jina AI

DEV Community

Mumbli – my personal Wispr Flow Getting Paid Should Not Be a Geopolitical Nightmare: My NOWPayments Integration Story Prompt Flow — a visual side project for flow design, trace, and integration steps (looking for feedback) AI Citation Registry: Temporal Gaps in Government Publishing Cycles ShowDev: I built a 100% local, zero-upload PDF editor using WebAssembly Written by an AI Pipeline, Verified by Three Models. Is It Slop? Part1 Vulkan: Drawing Triangle 1 Why I Stopped Using useEffect to Sync State — and What I Use Instead Por qué dejé de usar useEffect para sincronizar estado y qué uso ahora Migrating a Long-Running WordPress Site to Payload CMS (And All The Chaos That Came With It) Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans Azure DevOps Structure Explained: Organizations, Projects, and Repos Without the Mess A Simple React Hook for localStorage State, Expiry, and Sync I sold you on /scratchpad. Then I migrated to /note. Fixing WSL Errors on Windows 11 Your app is not Netflix. Stop building like it is. Resolving inter-service communication issue I built an email cleaner. CSV parsing took longer than the actual validators. How I Would Learn Full-Stack Development in 2026 If I Started From Zero Partition Evolution: Change Your Partitioning Without Rewriting Data What Google Play's I/O 2026 Updates Look Like From a Solo Indie Puzzle Developer Forgetting the Myth of "Ease of Integration" When Selling Digital Products with Bitcoin My 4-Step Regex Debugging Workflow (That Actually Saves Time) Stop Scraping Betting Sites: How to Build a Real-Time Sports Tracker in Python Civic Identity and Responsibility in Modern Democracy OLTP vs OLAP Are binaries really executable code ? The lie of the 80%: why software progress charts don't work What a Datacenter in Space Actually Buys You: Three Server Racks Is AI Actually Citing Your Site? How to Measure What Google Rankings Can't Accessibility - This looks like a job for a developer advocate! I built a Mac app that turns web pages into live widgets How to Teach Source Evaluation When Your Students Use ChatGPT More Context Does Not Mean More Trust RAG Series (24): Code RAG — Teaching AI to Understand Your Codebase Past the JVM Design decisions behind my “Irregular German Verbs” iOS app WordPress 7.0 "Armstrong" Is Live — Post-Release Deep Dive 🎺 Performance and Apache Iceberg's Metadata I Shipped a Bug to Production That Cost Us 3 Hours of Downtime 程序人生:在代码与时间之间 The Wrong Way to Think About XRPL Event Infrastructure What I Learned About MND, Voice Banking, and Why Assistive Tech Is Personal $1.50/Month Email Infrastructure That Beats Your $20 SendGrid Plan Cloud Unit Economics: The Metrics DevOps and FinOps Teams Actually Need Bypassing Payment Platform Restrictions Was The Best Decision I Ever Made For My Digital Product Business The Hidden Life of a Container: A Complete Lifecycle When a port is already in use, there is no interactive way to find it — so I built `port-peek` Como Sumir com o Barulho do Teclado Mecânico no Ubuntu Usando o NoiseTorch Google I/O 2026 dropped a bomb on Android tooling, and nobody's talking about it (or maybe they are 😅) Mentoring Junior Developers: What Actually Works How I Prevented Claude Code from Breaking My Architecture with 18 Tests That Run in 0.4 Seconds I Controlled an ESP32 Drone Using Only My Voice vite HMR is silently the reason ur laptop fan wont stop AI Agents Security for Developers: Don't Let Your Agents Become a Liability Single List Keyboard Handling 9 SaaS development companies worth knowing (a technical look) Material Nova — The Best VS Code Theme of 2026 Inference Routing Is Becoming an Infrastructure Placement Problem I just build a League MBTI Analytics Why I Built My Own Site with Astro, Not WordPress when I use WordPress for a Living Hello! I'm a balloon artist who started 3D modeling 7 Next.js 16 Caching Bugs That Compile Fine and Break Silently in Production I got tired of writing READMEs so I built a tool that generates them from your GitHub URL FrontGate: a Lightweight Package Proxy for Supply Chain Security Why Your Expense Tracking Architecture Keeps Breaking Stop your AI trading agent from hallucinating technical analysis Breaking the Monorepo Barrier in a Crypto Store for Digital Products Imposter Syndrome Is Something We All Struggle With at Some Point in Our Careers Moving Beyond the Black Box: How I Built a Real-Time Voice Fitness Coach using Next.js 15, Convex, & Vapi.ai How to Recover Kafka DLQ Messages After a Schema Change Broke Your Consumer From Spec-Driven Development to Attractor-Guided Engineering Githubster free tool to track your GitHub followers and unfollowers Why Bitcoin Core RPC is Too Slow for High-Frequency Trading (And How to Fix It) Why Reading Food Labels Shouldn't Feel Like Decoding a Chemistry Exam I built a "brain" for AI coding agents — it never forgets and never stops How to Build a Local LLM Agent to Automate Work List Generation from Monthly Reports (With Jira Integration) Controlling Employee AI Usage on Managed Devices: Browser Controls, Cloudflare AI Gateway, and AWS Bedrock When Global Payment Gateways Fail, Local Solutions Shine LeetCode Solution: 13. Roman to Integer End-to-End Observability for vLLM and TGI: from DCGM to Tokens LeetCode Solution: 12. Integer to Roman 🚀 A Beginner’s First Look at Project IDX: Secure Coding from Day One Team Topologies for DevOps: A Practical Implementation Guide Seven Contradictions Shaped an Architecture. Telemedicine in Venezuela: A Technical Guide for Clinics in 2026 SSO, SAML, OIDC, and SCIM: What Actually Happens When You Click "Sign in with Google" Mastering Next.js 16 Server Actions & Forms: The Future of Full-Stack React | Muhammad Arslan Enterprise Laravel API Development: Best Practices for Performance, Security, and Scale | Muhammad Arslan How I Turned an Image Into a 3D Model in Minutes With AI Why Pure Rust WASM Is Harder Than It Looks Platform Stores Are a Dead End for Crypto Payments The VLA Testing Pipeline in Mano-AFK: When AI Agents QA Their Own Work LeetCode Solution: 10. Regular Expression Matching IPv4 Geolocation and Leasing: A Practical Guide for Network Operators Reconciling the Inefficiencies of Global Crypto Payments Platforms I Exported HT-Demucs FT to ONNX in 2026 (4 Blockers Everyone Else Gave Up On) 🤖 The Hacker in the Machine: Using AI Agents to Build Interactive Security Games Savings Plan Amortized Cost in AWS Cost Explorer: What It Is and How to Use It How to Tailor Your Resume to a Job Description in 5 Minutes (A Method That Actually Works)
Four Layers of Validation in Kubernetes with Claude Code
Jake Page · 2026-05-21 · via DEV Community

Earlier this year, Moltbook, a social network for AI agents, launched, trended, and became a cautionary tale within the same week. Security researchers at Wiz found a Supabase API key sitting in its client-side JavaScript, which was the database’s only access control, with no Row Level Security to narrow what that key could reach. The result: 1.5 million API tokens, 35,000 email addresses, and thousands of private messages exposed to anyone with a browser console.

Moltbook was a greenfield project with no review process. The same class of mistake is far more serious when AI-generated code lands inside applications that already have pre-existing users, real data, and an existing trust surface. That’s increasingly the reality: as organizations adopt AI coding agents, more and more AI-generated code is landing directly in production services that already hold credentials and personal details of your users. A recent survey found that 95% of developers don’t fully trust AI-generated code, while only 48% consistently review it before committing, yet it’s shipping regardless.

Static review tools catch only some classes of issues: common CVEs, dependency hygiene, style violations, deterministic anti-patterns. What they can’t see are things like the actual name of your Kubernetes Secret in this cluster, whether your auth middleware is wired into the right route in this service, whether the request a real user will send makes it through the new code path without breaking something downstream.

Closing that gap takes four independent layers: AI agent skills that shape what gets generated, commands that audit what was generated, integration tests that hit staging endpoints, routing traffic to your local code, and preview environments that let a human review the change against staging dependencies before merging.

Layer 1: Skills (passive, shaping what gets generated)

Most AI coding assistants let you write down rules that shape generation. The simplest mechanism is a config file that the assistant loads into context on every prompt: .cursorrules in Cursor, CLAUDE.md in Claude Code, .github/copilot-instructions.md in GitHub Copilot. Drop NEVER/ALWAYS rules in there and the AI follows them. The downside is that those files load on every prompt, even when you’re working on something unrelated, and every rule you add costs tokens whether or not it’s relevant.

Claude Code goes a step further with skills: structured rule sets that ship as plugins (a directory with a SKILL.md and supporting reference files). Each skill has a description, and the model pulls a skill into context only when your prompt matches what that skill is meant to cover. If a skill never gets matched against the prompt, it never gets loaded, and you don’t pay for the tokens it would consume. We’ve already shipped six skills for AI agents working with mirrord, and this post adds a seventh focused on validation: k8s-validation, an open-source set of NEVER/ALWAYS rules for code that runs inside a Kubernetes cluster.

The skill covers two halves: the cluster-level concerns (Secrets, RBAC, pod hardening, NetworkPolicies, supply chain, file handling) and the application-level concerns that determine whether the workload behaves correctly inside the cluster (HTTP and parameter handling, auth, output sanitisation, API contracts, env-var configuration, test coverage).

Before and after

Without the skill, the model has no way of knowing about things like Kubernetes Secrets already mounted as environment variables in your pod, auth decorators other handlers in your service already use, or PII-sanitization utilities your team has already built. So it does the obvious thing: hardcodes the API key, skips the auth check, and returns the LLM output directly. With the skill loaded, the following prompt (“add a /summarise endpoint that calls OpenAI’s API”) produces something like this:

import os
from openai import OpenAI
from utils.sanitize import filter_pii

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.route('/summarise', methods=['POST'])
@require_auth
def summarise():
    text = request.json.get('text', '')
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": f"Summarise: {text}"}],
    )
    summary = response.choices[0].message.content
    return jsonify({"summary": filter_pii(summary)})

Enter fullscreen mode Exit fullscreen mode

filter_pii here is a stand-in for any utility that strips personally identifiable information (names, SSNs, emails, etc.) out of free text. Different teams build this differently; the Microsoft Presidio library is one of the more common open-source starting points.

Three differences from what the model would produce without the skill: the key comes from an environment variable backed by a Kubernetes Secret, the endpoint sits behind @require_auth, and the LLM output runs through filter_pii before going back to the user.

What skills can’t do

Skills shape generation, but they don’t verify anything.

In the example above, the AI correctly followed the skill: it read the API key from the environment, applied @require_auth, and called filter_pii from utils.sanitize. But the skill has no way to verify that filter_pii actually works. If the utility in your codebase only strips email addresses and misses phone numbers, the skill can’t know that. A user document containing a phone number sails straight through the filter and into the response, and the code looks correct at every layer the skill can see.

Skills set a floor by preventing the obvious structural mistakes. They’re instructions to a model, not checks against reality.

Layer 2: Commands (active, checking what was generated)

Where skills shape what the model generates, commands check what already exists. They’re explicitly invoked by a developer, an agent, or a CI step, and they run a defined set of checks against the code in front of them.

The same install from Layer 1 also ships a slash command: /k8s-validation:audit. It scans your codebase for the same NEVER/ALWAYS rules the skill enforces during generation, traces data flow through handlers and queries, and classifies each finding by severity. Skills don’t always load (a vague prompt, or a quick edit to a file the model didn’t classify as Kubernetes-adjacent, and the rules never enter context). The audit is the backstop: it runs on the code regardless of how the code got written.

> /k8s-validation:audit content-api/

CRITICAL 2 | HIGH 4 | MEDIUM 1 | INFO 0

[CRITICAL] src/routes/summarise.py: Hardcoded OpenAI API key → Use os.environ
[CRITICAL] src/routes/download.py: User filename in path without sanitization → Use secure_filename()
[HIGH]     src/routes/summarise.py: No authentication middleware → Add @require_auth
[HIGH]     k8s/deployment.yaml: No SecurityContext defined → Add runAsNonRoot, drop ALL
[HIGH]     src/routes/summarise.py: Reads OPENAI_API_KEY but no manifest defines it → Add to deployment.yaml env block
[HIGH]     src/routes/summarise.py: New endpoint with no integration test → Add test in tests/integration/

Enter fullscreen mode Exit fullscreen mode

Note that the output mixes security findings (hardcoded key, missing SecurityContext) with correctness findings, meaning “does the code do what was asked, given the rest of the system” (the Kubernetes deployment manifest doesn’t define the env var the code reads; the new endpoint shipped without an integration test). Both halves matter for AI-generated code.

Because the audit is a command you run rather than a rule the model loads, the same invocation works in three places: a developer runs it before opening a PR, an agent runs it as part of its own loop after generating code, and CI runs it as a merge gate. You can wire it into one, two, or all three.

## Validation Workflow
After generating or modifying any Kubernetes-related code, run `/k8s-validation:audit`
on the changed files. If any CRITICAL findings exist, fix them before proceeding.

Enter fullscreen mode Exit fullscreen mode

What commands can’t do

The audit is still static analysis. It can find “you hardcoded a secret” or “you’re missing a SecurityContext,” but it can’t tell you whether your filter_pii regex actually catches the PII your users will send, or whether the environment variable you’re reading will resolve to a value in your staging cluster. Commands check the shape of the code, not the behavior.

Layer 3: Integration tests (runtime, proving it works)

Your team probably already has integration tests that hit your API endpoints, check response shapes, and verify that authentication rejects bad credentials. These tests encode what “correct behavior” actually means for your application.

The bottleneck is running them. Locally, you mock your database, your auth service, your message queue, and hope the mocks match reality. In CI, each cycle takes 5 to 10 minutes. For a human pushing a few times a day, it’s already frustrating enough. For an AI agent trying to fix a failing test, it’s a feedback loop far too slow to learn from: the agent burns tokens on every iteration, and the integration bugs only surface after the change has been written, pushed, and built.

mirrord changes this equation by letting your local process stand in for a deployed pod. Your local code gets the environment variables from the target pod, the same cluster-level files it has access to, and the same view of internal services. In steal mode, traffic destined for the targeted pod is intercepted and routed to your local process instead of whatever’s deployed in staging. Your existing integration tests, pointed at staging endpoints as usual, now run against your local code in seconds, not minutes.

The same pattern scales horizontally. Because mirrord can split a single pod’s incoming traffic between many local processes using header-based filters, multiple agents (or developers) can iterate against the same staging cluster simultaneously, each one routing its own slice of the traffic to its own local code. One staging environment, many concurrent agents, real downstream services for all of them.

What this catches that the other layers can’t

Consider a prompt like “have /summarise fetch the document from our content-store service first.” The agent writes a handler that calls http://content-store/documents/{id} and reads response.json()["title"].

The catch: content-store moved to v2 months ago and now returns {"document": {"name": ..., "text": ...}}. The flat title/body shape only exists in the AI’s training data. Skills generated structurally clean code (good). The audit confirmed the call was made and the response was consumed (also good). Neither layer knows what shape content-store actually returns today.

The setup to fix this is two processes. You or your AI agent starts a mirrord session, your e2e tests run as normal against the staging content-api endpoint:

# Terminal 1, run your local content-api in place of the deployed pod
mirrord exec --target deploy/content-api --steal -- python -m content_api

# Terminal 2, run the existing integration suite against staging as usual
pytest tests/integration/test_summarise.py

Enter fullscreen mode Exit fullscreen mode

When the test hits staging’s content-api endpoint, mirrord steals the request and reroutes it to your local process. The local handler calls http://content-store/documents/..., and that outbound call also routes through mirrord, hitting the real content-store in staging. The real service returns {"document": {"name": ..., "text": ...}}. The local code does response.json()["title"] and crashes with KeyError.

You fix the code to read the new shape, rerun the test, it passes. The bug surfaces in your local code, against real downstream services, in seconds, instead of after a deploy cycle. The same pattern works for any other dependency the code touches: environment variables from the pod, files from mounted volumes, database queries against the real Postgres. mirrord runs your code, the cluster supplies its real environment.

Layer 4: Human review in a real environment

When the agent opens the PR, a human should still get to see the change running in a real environment, not just read the diff. mirrord’s Preview environments make that easy: a GitHub Action spins up an isolated pod in your staging cluster running the PR’s code, connected to all the real downstream services, and scoped to that PR via an environment key.

Reviewers can click through the actual feature instead of inferring behavior from the code or having to run the whole application locally. Most of the failures the previous three layers don’t catch, UX regressions, surprising interactions, “looks right but feels wrong”, show up the first time a human uses the thing.

Putting it together: agents handle layers 1–3, humans handle layer 4

Layers 1 through 3 can be run by the agent itself. Instead of generating code, opening a PR, and hoping CI catches the issues, the agent generates code shaped by skills, runs /k8s-validation:audit to check for structural issues, runs the integration tests via mirrord against real infrastructure, and fixes any failures before committing. The agent doesn’t even have to write the test from scratch: Layer 1’s validation skill includes a rule that any new HTTP handler must come with an integration test, so the test gets generated alongside the endpoint. Layer 3 just runs it against real infrastructure.

Layer 4 is the handoff. Once the agent has passed its own checks, it opens a PR and a human gets a live preview environment to click through rather than a diff to infer from. The failures that surface at that stage, UX regressions, surprising interactions, things that look right but feel wrong are exactly the ones that didn’t show up in the previous three layers.

We’ve documented the concrete setup (per-service mirrord configs, helper scripts, and an AGENTS.md that tells the agent which script to run) in our mirrord with AI agents guide. The broader argument for why this matters, including the token cost of agents stuck in a feedback-less loop, is in How to prevent token burn using mirrord with e2e tests.

Adopting the layers

The four layers are independent. Pick the ones that close your biggest gap and add the others when the next failure teaches you what’s still missing.

Install the validation skill with /plugin install k8s-validation@metalbear-co/k8s-validation-plugin in Claude Code, or as a local rules directory for Cursor and GitHub Copilot (see the repo README). The /k8s-validation:audit command ships in the same install. For the runtime layer, install mirrord and wrap your local service with mirrord exec --target deploy/<service> --steal -- <run-command>. For preview environments on each PR, the feature is included in mirrord for Teams.

Each layer closes a different gap. Stop at any point and you’ve made things better than they were.

The skills are open source. If your AI assistant generates something the skills don’t catch, open a PR.