惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

人人都是产品经理
人人都是产品经理
W
WeLiveSecurity
Recorded Future
Recorded Future
P
Privacy & Cybersecurity Law Blog
V
Vulnerabilities – Threatpost
C
Cybersecurity and Infrastructure Security Agency CISA
G
GRAHAM CLULEY
S
Securelist
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
小众软件
小众软件
The Hacker News
The Hacker News
The Cloudflare Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
V
V2EX
C
Cisco Blogs
Cisco Talos Blog
Cisco Talos Blog
腾讯CDC
Recent Announcements
Recent Announcements
Jina AI
Jina AI
K
Kaspersky official blog
The GitHub Blog
The GitHub Blog
云风的 BLOG
云风的 BLOG
酷 壳 – CoolShell
酷 壳 – CoolShell
GbyAI
GbyAI
F
Fortinet All Blogs
T
ThreatConnect
S
Schneier on Security
罗磊的独立博客
Y
Y Combinator Blog
C
Check Point Blog
T
The Exploit Database - CXSecurity.com
宝玉的分享
宝玉的分享
aimingoo的专栏
aimingoo的专栏
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
I
Intezer
F
Full Disclosure
T
Troy Hunt's Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
WordPress大学
WordPress大学
Application and Cybersecurity Blog
Application and Cybersecurity Blog
V
V2EX - 技术
C
Comments on: Blog
T
Tenable Blog
Project Zero
Project Zero
H
Help Net Security
A
Arctic Wolf
Google DeepMind News
Google DeepMind News
NISL@THU
NISL@THU
博客园 - 【当耐特】
F
Fox-IT International blog

DEV Community

Stop Trusting Your Accuracy Score: A Practical Guide to Evaluating Logistic Regression Models Serious Question: Is the Developer Job Actually in Risk Due to AI? published: true tags: #discuss #career #ai #help rav2d: We ported an AV2 video decoder from C to Rust — here's why Your New Domain's First Week of GA4 Is a Lie: 4 Days of Raw Data from a Launch Gemma Guide - Real-Time Spatial Awareness for Blind Users A Field Guide to Human–AI Relations (For the Newly Bewildered Mortal) The AI Agent That Learns While It Works — A Complete Guide to Hermes Agent Inviting collaborators to work on ArchScope ArchScope is an interactive web-based tool that lets you design, visualize, and test system architectures with real-time performance simulations. Github - ArchScope is an interactive web-based tool that lets you Gemma 4: Google's Open-Weight AI Is a Game Changer for Developers Confessions of a Git Beginner: Why the Terminal Stopped Scaring Me Docker 容器化实战:从零到生产部署 🚀 I Built a Full Stack Miro Clone with Real-Time Collaboration using Next.js Building an African Economic Data Pipeline with Python, DuckDB & World Bank API llms.txt vs robots.txt vs ai.txt: The Developer's Cheat Sheet Intigriti Challenge 0526 Writeup Business Logic Flaws: How Attackers Skip Steps in Your App to Get What They Should Never Have Why Vibe Coders Need Boilerplates to Save Time, Tokens, and Build More Secure SaaS Projects Idle Cloud Cost Is the New Egress Cost Quark's Outlines: Python Traceback Objects Ghost in the Stack (Part 1): Why uninitialized variables remember old data Building a High-Performance Local Chess Assistant Extension with WebAssembly Stockfish and Manifest V3 Breaking the Trade-off Between Self-Custody and Intelligent Automation on the Stellar Network I Open-Sourced a Practical Fullstack Interview Preparation Repository (React + Node + System Design) 🚀 How I Started Coding as a Student (Beginner-Friendly Guide) WordPress vs. Ghost: Why Automated Bot Attacks Are Making us think much I tested 4 AI agent-governance tools against an open spec - here's the matrix zkML Inference Proof: What the Receipt Proves, and What the Model Still Does Not I Scored 1000/1000 on AWS Certified AI Practitioner (AIF-C01) Here's Every Resource I Used Go - Struct and Interface Handling JSON Requests in Go Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS How I Caught and Fixed an N+1 Query in My Django REST API I got tired of paying $10/month to remove image backgrounds – so I built it for free How to Start Coding as a Student: A Complete Beginner’s Guide 🚀 Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS What Are Buffers? Build AI Agents with Hot Dev The Client Onboarding Checklist That Prevents 90% of Project Problems Scalable Treasure Hunts Are a Myth, But We Almost Made One Gemini 3.5 Flash Has a 1M Token Context Window. Here's What You Can Actually Build With It. I built a ultra-polished developer portfolio template using React & Tailwind v4 (with zero-JSX configuration) Gemini CLI Is Dead. Here's the Better Thing That Replaced It Post-quantum cryptography for embedded and IoT: secure boot, TLS and OTA Understanding Optimistic Preloading in Modern Applications Nobody Wants to Read Your Code (And You Don't Want to Read Theirs) A clothing pairing app E2B vs E4B vs 31B Dense: The Practical Guide to Choosing the Right Gemma 4 Model I built an AI app store screenshot generator because Figma made me cry — looking for brutal feedback Hello DEV Community — My Developer Journey Begins Adaptable apps on ChromeOS: a post-mortem The WordPress Paradox: Why It’s Here to Stay (and How to Stop Ruining It) I built a local voice AI that can change to 9 different personalities! UXRay: I Built an AI That Roasts Your UI Like a Senior Designer Would Wyrly DI: Type-safe Dependency Injection for Modern TypeScript The contract is the interface: agent-driven Steampipe Stave in one command Gemma 4's Hidden Superpower: Why Built-in Thinking Tokens Change Everything for Evaluation Tasks ⚡ WordPress Performance: The Real Truth They Don't Tell You A Mobile App Usually Needs an Admin System First Customer Portals Should Remove Repeated Admin Work Episode 4: The Time Loop (Layers & Caching) I Built ContextForge with Gemma 4: A Project Memory Generator for Developers and AI Coding Agents Why shadow DOM beat iframe for inline tooltips HOW TO CREATE USER AND ASSIGN ROLES IN AZURE WITH ENTRA ID When AI Blackmail Goes Viral Episode 3: The Secret Scroll (The Dockerfile) Monte Carlo Simulation for Engineers: Turning Uncertainty Into Numbers The tokens-per-byte trap: character-level 'compression' adds tokens Nobody Reads Your Code Anymore Why I built a collection of 5 free, zero-signup career finance tools for solo builders 🚀 New React Challenge: Instant UI with useOptimistic Resolvendo a Alucinação da IA na Arquitetura de Software com Code Property Graphs e .NET 9 S1 — Clean Backtrace Crashes: How to Diagnose and Fix Them Cómo solucionar el bucle infinito en useEffect con objetos y arrays The Brutal Reality of Running Gemma 4 Locally I made Claude Code refuse to write code unless the ticket scores 80/100 I Fed React's Entire Hooks Transition History to Gemma 4. Here's What It Found That We Missed. Building a Private RAG System: Lessons from a Local-First AI Journal CodePulse AI — Reviving an AI-Powered Repository Intelligence Platform How to Split Video into Segments with FFmpeg (CLI + API) I've audited dozens of estate agency websites. The same 5 problems show up every single time. Part 1: Taming Asynchronous JavaScript: How to Build a "Mailbox" Queue Building My AI-Powered VS Code Extension 🚀 Google Login in Express with PassportJS & JWT Great example of Gemma 4 moving beyond chatbots into real-world decision support. Using AI to guide everyday actions like recycling shows how impactful applied LLMs can be when designed for usability, not just capability. #Gemma4 #AI #Sustainability Building a Production AI Chatbot for an Educational Institute: Architecture, Lessons & Full Stack Deep-Dive Google Login in Express with PassportJS & JWT How I reclaimed 47GB on my MacBook by cleaning developer project junk Operators Are Not Oracles: How We Learned to Stop Worrying and Love the Configuration I Built 6 Free Developer Tools for AI APIs, Cron, Docker, and Self-Hosting How I Built a Real-Time Precious Metals Price Feed for 30,000 Concurrent Users in Laravel How to Use a SERP API to Validate Whether a Project Idea Is Worth Building Gemma 4 discussions often focus on capability, but real-world impact depends on deployment context. For offline education, especially in low-connectivity regions, latency, cost, and local inference matter as much as model strength. Local Mind Explores it Space Complexity + Ω and Θ Notations Google I/O 2026 Just Confirmed the Shift From AI Chatbots to AI Agents How to Add API Monitoring to an Express App in 5 Minutes (2026) Designing an In-Game Inflation Tracking Algorithm for Web Utility Apps Google AI Studio Just Changed the Shape of App Development If you struggle to learn then this is for you. Best AI Agent Security & Guardrails Tools in 2026: LLM Guard vs NeMo vs Guardrails AI Building Dynamic RBAC in React 19: From Permission Strings to Component-Level Access Control
From YAML to AI Agents: Building Smarter DevOps Pipelines with MCP
Nimesh Kulka · 2026-05-23 · via DEV Community

From YAML to AI agents: building smarter DevOps pipelines with MCP

DevOps teams have spent years turning manual work into YAML.

That helped. CI runs on every pull request. Deployments can be triggered from a commit. Kubernetes can reconcile desired state. Terraform can plan infrastructure before it changes anything.

But a lot of DevOps work still sits outside the pipeline:

  • reading failed CI logs
  • checking whether a deployment is safe
  • connecting traces, alerts, recent commits, and infra changes
  • deciding whether to roll forward or roll back
  • writing the same runbook steps again and again
  • asking five tools for the same incident context

This is where AI automation gets interesting. Not as a magic replacement for DevOps engineers, but as a better interface for operational work.

The strongest version of this stack is not just "AI in CI/CD." It is an AI-native DevOps layer built around three pieces:

  1. MCP servers for tool access
  2. Skills for repeatable expert workflows
  3. Plugins for company-specific infrastructure actions

If you build it well, the pipeline gets faster because the boring glue work disappears. If you build it badly, you get an AI bot with production credentials and vague judgment. That is not automation. That is a future incident report.

Why MCP matters for DevOps

MCP, or Model Context Protocol, gives AI applications a standard way to connect to external systems.

The official MCP docs describe three main server-side primitives:

  • tools: functions an AI app can call, like file operations, API calls, database queries, or deployment actions
  • resources: context an AI app can read, like docs, schemas, logs, runbooks, or service metadata
  • prompts: reusable templates for structured workflows

That maps cleanly to DevOps.

A platform team could expose separate MCP servers for:

  • GitHub or GitLab
  • CI/CD logs
  • Kubernetes
  • Terraform or OpenTofu
  • Argo CD
  • Prometheus, Grafana, Datadog, or OpenTelemetry backends
  • cloud cost data
  • incident management
  • internal service catalog

The AI agent does not need to scrape random dashboards or guess from partial screenshots. It can ask real tools for real state.

For example:

User: Why did the production deploy fail?

Agent flow:
1. Read the failed GitHub Actions job logs.
2. Check the changed files in the pull request.
3. Query Argo CD for sync status.
4. Read Kubernetes events for the affected namespace.
5. Pull recent error traces from observability.
6. Summarize the likely failure and suggest the smallest safe fix.

Enter fullscreen mode Exit fullscreen mode

That is not replacing the DevOps engineer. It is removing the tab-hopping tax.

Skills are where the real expertise lives

MCP gives the agent access. Skills tell it how to work.

A skill is a reusable procedure for a specific job. In DevOps, that matters because production work has rules. You do not want an agent inventing a deployment strategy every time someone asks a question.

Good DevOps skills could look like this:

skill: debug_failed_ci
steps:
  - fetch failed jobs
  - group logs by failure type
  - check if failure is test, lint, dependency, infra, or runner-related
  - compare against recent commits
  - suggest the smallest code or config fix
  - never rerun expensive jobs more than once without approval

Enter fullscreen mode Exit fullscreen mode

skill: safe_kubernetes_rollout
steps:
  - check current deployment health
  - verify image tag and git SHA
  - check recent incidents for the service
  - confirm SLO status before rollout
  - deploy to one environment first
  - watch error rate, latency, and pod readiness
  - stop if guardrail thresholds fail

Enter fullscreen mode Exit fullscreen mode

skill: terraform_plan_review
steps:
  - read the Terraform plan
  - classify adds, changes, and destroys
  - flag IAM, networking, database, and public exposure changes
  - check cost-sensitive resources
  - summarize blast radius
  - require human approval for destructive or privilege-expanding changes

Enter fullscreen mode Exit fullscreen mode

This is the part I think most people miss. The value is not just that an AI can call tools. The value is that it can call tools through a workflow your team already trusts.

Plugins make it fit your company

Every company has weird infrastructure.

Maybe your deploys go through Argo CD, but production still needs a Slack approval. Maybe your Terraform state is split across workspaces. Maybe the service catalog is internal. Maybe your rollback process depends on a custom CLI that only three people understand.

Plugins are how you expose that reality safely.

A plugin can wrap a company-specific action like:

  • get_service_owner(service_name)
  • fetch_deploy_risk_score(pr_number)
  • create_change_request(environment, service, sha)
  • run_internal_canary(service, image_tag)
  • open_incident_with_context(summary, traces, logs)
  • estimate_cloud_cost_diff(terraform_plan_id)

The plugin should not give the agent unlimited shell access and vibes. It should expose narrow, typed actions with logs, permissions, and guardrails.

A good internal DevOps plugin feels boring:

{
  "name": "request_production_deploy",
  "input": {
    "service": "checkout-api",
    "image_tag": "2026.05.23.4",
    "change_summary": "Fix timeout handling in payment gateway client",
    "risk_level": "medium"
  },
  "requires_approval": true,
  "audit_log": true
}

Enter fullscreen mode Exit fullscreen mode

Boring is good here. Boring means it can survive production.

Who should use this?

This stack is useful for a bunch of specialists, but each one should use it differently.

DevOps engineers can use it to debug CI/CD failures faster, generate release notes, identify flaky jobs, and automate repetitive deployment checks.

Platform engineers can turn internal developer platforms into agent-accessible systems. Instead of making every developer learn five dashboards, they can expose safe workflows through MCP servers and skills.

SREs can use it for incident triage: correlate alerts, attach traces, find recent deployments, pull service ownership, and suggest runbooks.

Cloud infrastructure engineers can use it to review Terraform plans, detect risky IAM changes, estimate cost impact, and standardize provisioning workflows.

Release engineers can use it to decide whether a release is ready, what changed, what failed, what needs approval, and what rollback path exists.

DevSecOps engineers can connect security checks into the pipeline: secret scanning, policy checks, dependency review, artifact provenance, image scanning, and permission drift.

AI infrastructure engineers can use the same pattern to manage model-serving deployments, GPU capacity, eval gates, prompt/version rollouts, and inference observability.

The common thread is simple: if your job involves reading state from multiple systems and taking careful action, AI agents can help. But only if you give them structured tools and clear operating procedures.

A practical AI-native CI/CD pipeline

Here is a realistic pipeline architecture.

AI-native DevOps pipeline diagram

The shape is simple: pull request, CI checks, AI agent, MCP tool layer, guardrails, then GitOps/deploy automation. The agent speeds up context gathering. The pipeline still owns execution, approval, and audit history.

Pull request opened
        |
        v
CI runs tests, lint, security checks
        |
        v
Agent reads CI result through MCP
        |
        v
If failed:
  - summarize failure
  - identify likely owner
  - suggest fix
  - open comment with exact logs and files

If passed:
  - read diff
  - check Terraform or Kubernetes changes
  - classify deployment risk
  - verify service ownership and runbook
  - prepare deploy summary
        |
        v
Human approval for production
        |
        v
Argo CD / deploy tool syncs desired state
        |
        v
Agent watches rollout health
        |
        v
If healthy:
  - close deploy task
  - attach release summary

If unhealthy:
  - collect logs, traces, events
  - recommend rollback or roll-forward
  - require approval before mutation

Enter fullscreen mode Exit fullscreen mode

This is faster because the agent handles context collection. It is safer because the agent does not blindly mutate production.

That balance matters.

Where Kubernetes and GitOps fit

Kubernetes already works like an automation platform. Controllers watch desired state and reconcile actual state. The Kubernetes docs describe the controller pattern as programs that read an object's spec, act on it, and update status.

GitOps tools like Argo CD build on that idea. Argo CD treats Git as the source of truth, compares live cluster state against desired state, and syncs when needed.

AI should not replace that control loop.

It should sit above it.

The agent can explain what changed, detect risk, connect symptoms to recent deploys, and recommend action. Kubernetes and Argo CD should still do the actual reconciliation with clear audit history.

That gives you the best version of both worlds:

  • deterministic infrastructure control loops
  • human-readable operational reasoning
  • faster triage
  • safer approvals

Observability is the agent's fuel

An AI DevOps agent is only as good as the context it can retrieve.

OpenTelemetry matters here because it gives teams a common way to collect traces, metrics, and logs. Traces are especially useful because they show the path of a request across services.

For an agent, this context can answer questions like:

  • Did the error start after a deployment?
  • Which dependency is adding latency?
  • Is this one service failing or a full user journey?
  • Are we seeing infrastructure failure, code failure, or traffic shape change?
  • Did the rollback actually improve user-facing symptoms?

Without observability, the agent is just guessing politely.

Guardrails that should exist before production actions

If an AI agent can touch production, the guardrails need to be boring and strict.

Start with these:

  • read-only by default
  • separate permissions per environment
  • mandatory approval for production mutations
  • audit logs for every tool call
  • allowlists for safe actions
  • dry-run support for infrastructure changes
  • policy checks before apply
  • rollback plans attached to deploy actions
  • rate limits for repeated retries
  • no secret exposure in prompts or logs

Terraform run tasks are a good mental model. HCP Terraform can call external systems between plan and apply, show messages in the run pipeline, and block the apply phase when needed. That is exactly the kind of control point AI automation should respect.

Do not start by letting an agent run kubectl delete in prod.

Start by letting it explain what it would do, why it would do it, what it needs approval for, and how to undo it.

What to build first

If I were building this for a real DevOps team, I would not start with autonomous deployment.

I would start with three narrow workflows.

First: CI failure explanation.

The agent reads failed logs, groups the error, identifies the likely cause, links exact lines, and comments on the PR. Low risk, high value.

Second: Terraform plan review.

The agent summarizes infrastructure changes, flags destructive actions, points out IAM/network/database risk, and asks for review before apply.

Third: deployment health summary.

The agent watches rollout status, error rate, latency, pod readiness, recent traces, and recent alerts. It posts one clean summary instead of making someone manually check six tools.

Once those work, you can add more automation.

A good maturity path looks like this:

Animated AI DevOps pipeline flow

Level 1: Read-only assistant
Level 2: Suggests fixes and runbooks
Level 3: Opens tickets, comments, and summaries
Level 4: Runs approved low-risk actions
Level 5: Handles narrow autonomous remediation with hard guardrails

Enter fullscreen mode Exit fullscreen mode

Most teams should live at Level 2 or Level 3 for a while. That is not slow. That is how trust gets built.

The main takeaway

AI-native DevOps is not about replacing YAML with a chatbot.

It is about giving DevOps specialists a faster way to move through the work they already do: gather context, understand risk, apply a known workflow, and take the next safe action.

MCP gives the agent a standard way to reach tools. Skills give it repeatable expert behavior. Plugins make it fit the company's real infrastructure.

The result is a better pipeline:

  • faster CI/CD debugging
  • cleaner infrastructure reviews
  • safer releases
  • better incident context
  • less repetitive manual work

The best DevOps AI systems will not be the ones that act the most independently. They will be the ones that know when not to act.

Start with read-only context. Add skills. Wrap dangerous actions in plugins with approvals. Then automate the boring work first.

That is how AI makes DevOps faster without making production scarier.

References

  1. Model Context Protocol, Architecture overview https://modelcontextprotocol.io/docs/learn
  2. Model Context Protocol, Understanding MCP servers https://modelcontextprotocol.io/docs/learn/server-concepts
  3. GitHub Docs, GitHub Actions documentation https://docs.github.com/actions
  4. GitHub Docs, Workflows https://docs.github.com/en/actions/concepts/workflows-and-actions/workflows
  5. Argo CD Docs, Declarative GitOps CD for Kubernetes https://argo-cd.readthedocs.io/en/latest/
  6. Kubernetes Docs, Extending Kubernetes https://kubernetes.io/docs/concepts/extend-kubernetes/
  7. HashiCorp Developer, Set up HCP Terraform run task integrations https://developer.hashicorp.com/terraform/cloud-docs/integrations/run-tasks
  8. OpenTelemetry Docs, OpenTelemetry Concepts https://opentelemetry.io/docs/concepts/
  9. OpenTelemetry Docs, Traces https://opentelemetry.io/docs/concepts/signals/traces/