惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

FairLens AI: An Intelligent Dashboard for Automated Bias Auditing AI Metrics Decoded: From Parameters to TOPS I made git merge finish itself — in VS Code, in my terminal, and in CI You just can’t miss this… Redis Essentials: Architecture, Caching, and Setup Design to Code #5: Using AI to Build a Design System Analyzing 1,000 Engineering Problems Through GitHub Data Open Graph protocol: canonical reference How a 400-Engineer SaaS Company Cut PR-to-Production from 4.2 Days to 6.4 Hours with Claude Code Multi-Agent DevOps 💬 Embedded AI Chatbots vs Popup Bubbles — Which One Creates Better Engagement? Bajándole todos los minutos posibles al CI del backend con mas de 1000 tests Harness Engineering: Stop Re-Prompting Your Coding Agent Every Session HTML meta referrer: canonical reference AWS MCP Server Just Gave AI Agents Your Cloud Keys — Here's Why That Should Worry You Announcing the Trust Identity Protocol (TIP): HTTPS for the AI Era We built the feature in two days. Making it reliable took two weeks. LuisCore /for-agents.json — agent bootstrap — daily syndication · 2026-05-26 A Curious Journey Into Reverse Engineering an AI-Generated Python .exe Part 2: Enterprise Decision Intelligence Architecture: AI Governance, Threshold Policy Engines, and Operational AI Systems I will continue using Devise with Rails 8! The Developer's Guide to Picking the Right AI Code Model in 2026 (I Spent $500 So You Don’t Have To) 30 Kubernetes Tasks Every CKA Candidate Should Practice Before Exam Day Why Some Websites Feel Instantly Better to Use Advanced React Patterns I Wish I Knew 5 Years Ago ¿Cómo optimizar algoritmos en arreglos y listas con la técnica de dos punteros? I scanned 8 popular open source repos with one command. Here's what I found. mcp-probe v1.6.0: Stricter GitHub Actions checks for MCP CI gates How we connect two strangers' webcams fast (and keep the TURN bill small) LLM Agents Are Now Finding Zero-Days: How AI is Autonomously Rewriting the Rules of Vulnerability Research Minimal Code Doesn’t Mean Stable Code How I manage 40+ skills across Claude Code, Codex, and .agents folders Hardening Stealth Browser Fingerprint Integrity and State Persistence Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes How I Slashed My AI API Bill by 92% in 2026 — A Cost Optimizer's Speed Benchmark Guide How I Slashed My AI API Bill by 95% — A Practical Guide for 2026 A Go outbox library that runs inside your own DB transaction How I Built a Credit Optimizer That Saves 30-75% on AI Agent Costs (Open Architecture) The Missing POP: How I Ported a Yul Contract to Huff by Reading Every Opcode The Moment the Config Parser Became the Bottleneck Churn Tool Stack by Revenue Stage ($5K to $50K+) What I Learned Exploring AI-Generated 3D: A Hands-On Tour of Meshy, Tripo, and Three.js Day 15 - Software Composition Analysis(SCA) Contributing Upstream Instead of Forking: My grape-swagger-rails Story Behind The Badge: How We Built 2,000 Hackable Badges For Temporal Replay Access Control Doesn't Scale Linearly -- Part 3 33x faster than Rust: Why I stopped waiting for my compiler and built my own. I Built My First Production AWS Project as a Career Changer Why Detecting PII Matters More Than Ever JSON Schema in 10 Minutes — Validation, Types & Real Examples Python Tasks How I Started My Cybersecurity Journey as an SQA Engineer 🔐 Why "fancy fonts" in Discord and Instagram bios turn into boxes ☁️ GKE private cluster setup — common mistakes and how to avoid them I Thought a Username Didn’t Matter… Until I Saw How Much People Care About It Claude for Small Business: 382K Day-One Buyer's Guide I Built a Diagnostic Toolkit for PyTorch Because I Was Tired of Guessing Why Models Fail How I Built an AI-Powered Incident RCA Platform with LangGraph and RAG The Paywall Was a Painted Door Sonnet hallucinated. My agent stored it as fact. How React-Style Time-Slicing Keeps UIs Responsive 这个 Princeton 开源项目让 AI 自己修 Bug,19K Stars 但 90% 的人只用了 1% 功能 🔥 SWE-agent's 5 Hidden Uses Nobody Told You About 🔥 Decompiling Serial Number U-36: Python TERCOM Reconstruction, Cryptographic Logistical Forensics, and Swarm Consensus Fault Tolerance Microservices Patterns You Cannot Outrun a Wave I Fired My Entire Node.js Stack — Rust Rebuilt It in 3 Weeks (The Ugly Truth) BoxAgnts Introduction (2) — AI Agent Toolbox Cursor 3 ships parallel AI agents. Here is the multi-agent workflow that actually works. Prisma-7 A Complete Beginners Guide (With Free Cloud Database!) Akses HDD Rumah dari Laptop Kantor Pakai Tailscale + SMB (Tanpa VPN Ribet) Content Pipeline in MonoGame: Why I Don't Use It Debug Log #1 — The Pipeline That Looked Broken Data Structures in JavaScript: When to Use What (2026) BGP Route Flap Damping: A Solution or a New Problem? First look at AWS DevOps Agent The Next Big “Cult App” Probably Isn’t Another Social Media Platform From Template to Production-Shaped: An AI-Native Dev Flow for Go Side Projects Idempotency Keys: The API Pattern That Saves You From Duplicate Payments and Phantom Records Everyone's Building Jarvis. Nobody's Even Close. The Moment the Jaeger Tracer Exhausted Itself and What We Switched To How to Fix Tool-Use Loops in Autonomous Coding Agents Months of self-testing: Citations shine, other features remain unproven. Claude Code for Canary Deployments: How I Ship to 1% of Users Before Breaking Everything Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET) 20 Years of GPUs in Numbers: How FLOPS & TDP Grew, and Who Led the NVIDIA vs AMD Race (open dataset, 13.5k GPUs) Espressif Reveals CoreBoard and Korvo Dev Kits for ESP32-S31 Composable Abstraction Layer: o pattern que faltava entre Pinia e seus componentes Vue Your GitHub Actions Logs Are Leaking LLM Keys and Your SIEM Isn't Catching It Solving Complex Logic with Claude and Research Papers Building TheEpicBook: A Deep Dive into a Node.js Monolithic Web Application Haber yazilimi, haber scripti, haber sistemi: ayni urun, uc ayri arama niyeti Predicting Blood Glucose Fluctuations: Building a Transformer-based CGM Forecaster with PyTorch & InfluxDB Pre-task hooks: the one-line wire-up that gives your Hono agent shared memory Concurrent writes to a shared agent memory: what we shipped, what we punted on Building a Production Serverless URL Shortener on AWS — 21 Articles, Every Test Run for Real My CKA Cheat Sheet: Commands, Aliases, and Documentation Tricks I Used During the Exam Frontend Engineering Beyond Pixels: The Architecture of Digital Accessibility VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA into a Fancy Imitation Learner Fabric AI Functions Turn GenAI Into a Data Pipeline Step Proximate vs Ultimate: The Bug Is Never Just the Bug
Docker with AI: A Practical Guide to Running LLMs, Agents and MCP
Harsh Manvar · 2026-05-26 · via DEV Community

If you've been searching for how to actually use Docker with AI not just spin up a demo but run models, agents and MCP servers in production here's what We have learned over the years and put into our new book.

AI with Docker

If you typed "Docker with AI" into Google and landed here, you're in good company. That's the search I've watched explode over the last 18 months and it's also the question I get asked the most at meetups, on LinkedIn DMs.

People aren't asking "what is AI" anymore. They're asking something a lot more uncomfortable :

"I have a model, I have a notebook, I have a demo that worked once on my MacBook. Now what ?"

This post is for that person. I want to share how I think about Docker with AI today what the stack actually looks like in 2026, where most teams get stuck and the structured path @ajeetraina and I wrote down in our new book, Operational AI with Docker (links at the end).

No fluff. No "AI is changing the world." Just the parts I wish someone had handed me two years ago.

Docker Solved Software Packaging. AI Has the Same Problem, Just Heavier.

Think back to 2013. Before Docker, shipping a piece of software meant chasing dependencies, fighting OS differences and praying your requirements.txt lined up with the production box. Then containers showed up and the whole story compressed into three words: build, ship, run.

AI in 2026 looks suspiciously similar except the baggage is heavier :

  • A model file that's 4 to 70 GB
  • A tokenizer that needs to match the model exactly
  • GPU drivers that hate you specifically
  • Python environments that break the second you breathe on them
  • An agent that needs tools. Tools that need secrets. Secrets that need policies
  • A vector store. A retrieval pipeline. A prompt cache.

That's not a notebook problem. That's a packaging, isolation and runtime problem which is exactly what Docker has been solving for a decade in the software world.

So when I say "Docker with AI" I don't mean "let's stick a model inside a FROM python:3.11 and ship it." That's the cargo-cult version. The interesting version is:

How do we use Docker's primitives - images, runtimes, networking, secrets, orchestration to make AI workloads portable, reproducible, and operable ?

That's the whole question. Everything else is implementation detail.

What "Docker with AI" Actually Means in 2026

A lot has changed in the last year and most blog posts haven't caught up. So here's a quick map of the pieces I'm using daily right now.

Docker Model Runner (DMR)

This one trips people up. With Docker Model Runner, you don't put the model inside a container. DMR runs the model natively on the host, uses the GPU directly and exposes it through an OpenAI-compatible endpoint. What you get from Docker is the packaging story docker model pull, versioning and the same workflow you already know from images.

So when someone asks me "should I run my model inside a container?", the honest answer in 2026 is : probably not and you don't have to. Use DMR instead. That alone saves teams weeks of pain.

MCP and Docker MCP Gateway

The Model Context Protocol (MCP) exploded around December 2024. Within a few months, the ecosystem had over 3,000 MCP servers file systems, GitHub, Slack, databases, browsers you name it. If you're building an agent, MCP is how it talks to the outside world.

The problem? Running raw MCP servers is a security nightmare. Each one is a process with tool access, secrets and a giant trust surface.

The MCP Gateway fixes this with policy enforcement, secrets isolation, dynamic tool discovery and audit logs the boring infrastructure stuff that nobody blogs about because it's not flashy but that you absolutely need before you let an agent touch production.

Docker Sandboxes

Agents generate code. Sometimes they hallucinate rm -rf /.

Sometimes they pip-install something they shouldn't. Sandboxes give you a lightweight microVM to execute untrusted, agent-generated code without nuking your host. If you're running anything resembling an autonomous agent.

Agentic Compose and Docker Agents

Once you have models, tools and sandboxes, you need a way to wire them together that isn't 800 lines of glue code. Agentic Compose lets you declare agents, sub-agents and tools in YAML the same mental model you already have for docker-compose.yml just stretched to multi-agent workflows. It's versioned, reviewable and reproducible.

Kubernetes for GenAI

Eventually it leaves your laptop. When it does, you need autoscaling, cost-aware routing (because GPU minutes are expensive), observability that actually understands token usage and graceful failover when an upstream model API goes down. That's where Kubernetes comes back into the picture same patterns you know, with AI-specific twists.
That's the stack. Model Runner at the bottom, MCP and Sandboxes in the middle, Agents and Compose on top, Kubernetes wrapping the whole thing in production.

The Patterns I See on Real Production Calls

Most of my day job is sitting with platform and SRE teams that are trying to take an AI workload from "the data scientist's laptop" to "a service my on-call rotation can survive."

A few patterns repeat so often I now bring them up before the team does:

The model choice happens after the architecture is locked. Someone picks GPT-4 in week one because that's what the prototype used and six months later the bill is 5x the revenue. The chapter on choosing SLM vs MLM vs LLM is there for exactly this so you make the call when it's cheap to change, not after a board meeting.

Everything is one curl away from production. I've lost count of the agent demos I've reviewed where the tool call is a raw HTTP request to a database, with no auth scoping, no rate limiting and a secret pasted into an env var. That's a 2 AM incident waiting to happen. MCP Gateway exists for this exact reason and we spend a real amount of the book on how to put policy in front of your agents before they touch anything sensitive.

Nobody owns the GPU bill. When the workload was a prototype, it ran on someone's laptop for free. The moment it goes to a cluster, GPU costs land on a finance team that has no idea what an A100-hour even means. We walk through cost-aware routing, quantization and when running a smaller model locally is just objectively the right answer.
The observability story is "we'll add it later" And later never comes. Standard APM tools don't understand tokens, prompt caching or model failover. The Kubernetes chapters cover what observability for an LLM service actually needs to look like because if you can't see it, you can't operate it.

If any of these sound uncomfortably familiar, you're the reader we wrote this book for.

What's Inside the Book

Rather than copy the table of contents, here's the practical arc what you can actually do by the time you finish each part:

Run an LLM locally with Docker Model Runner, pull it like a container image, hit an OpenAI-compatible endpoint, swap models without changing a line of client code.

Build an AI agent with MCP, give it tools through the MCP Gateway, enforce which tools it can call, isolate its secrets, get an audit trail of every action.

Orchestrate a multi-agent system declaratively with Agentic Compose an orchestrator agent, a few specialist sub-agents, shared state and a clean way to version the whole topology.

Run agent-generated code safely inside Docker Sandboxes so a hallucinated shell command doesn't take down your laptop or your prod box.

Deploy on Kubernetes with autoscaling, cost-aware routing across multiple model backends and observability that actually measures the things that matter tokens, latency, error rates, $/request.

Every chapter is built on tools you can install today. Every example has working code in the companion repo, which we'll keep maintaining as the stack evolves.

Who I Wrote This For
Honestly, the same people who keep asking me these questions:

Developers who built an AI demo and now have to productionize it.
DevOps and platform engineers suddenly responsible for LLM workloads they didn't sign up for.
SREs trying to write runbooks for systems that hallucinate.
Architects sketching out an agentic AI roadmap and looking for a real operational reference.

You don't need an ML PhD. You don't need to have trained a model from scratch. If you're comfortable with containers and curious about how AI actually runs in real environments, you're the audience.
And if you come from the AI side and containers feel like a black box
the first few chapters will get you up to speed without talking down to you.

A Quick Word on Why It's Not Another Tutorial Series

Both @ajeetraina and I write a lot of blogs. He runs Collabnix; I've been publishing Docker and Kubernetes content for years. We could have just kept doing that.

But here's what we kept running into: tutorials age fast in this space. A post from 8 months ago is already half-wrong. People were reading 12 different tutorials, getting 12 contradictory answers and ending up more confused than when they started.

A book lets us tell a single, internally consistent story end to end. Pick the right model → run it locally → wrap it in an agent → secure the tools → sandbox the execution → ship to a cluster.

One narrative. One stack. One opinionated path that actually works.
It's a snapshot of where the practice stands in 2026 and a foundation you can build on as it keeps moving.

Get the Book

If any of this resonated, you can grab a copy here :

📘 Packt (Global): Operational AI with Docker
📦 Amazon (US): Paperback + Kindle
📦 Amazon (India): Paperback + Kindle
🔖 ISBN: 9781807301095

If you do pick it up, tag me on LinkedIn or X (@manvar_harsh) I read every message and I'd love to hear what's working for you and where you're still stuck.

Closing Thought

Two years ago, "Docker with AI" meant pulling a tensorflow:latest image and hoping for the best. Today it means a real, layered runtime Model Runner, MCP, Sandboxes, Agentic Compose, Kubernetes that lets you move from a working demo to a system you'd actually let your customers depend on.

That shift is what the book is about.

If you've been stuck somewhere on that path, I hope this helps you take the next step.
— Harsh