惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Recorded Future
Recorded Future
Microsoft Security Blog
Microsoft Security Blog
Recent Commits to openclaw:main
Recent Commits to openclaw:main
The Register - Security
The Register - Security
The GitHub Blog
The GitHub Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
人人都是产品经理
人人都是产品经理
量子位
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
有赞技术团队
有赞技术团队
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
Apple Machine Learning Research
Apple Machine Learning Research
The Cloudflare Blog
B
Blog RSS Feed
小众软件
小众软件
博客园 - 叶小钗
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 聂微东
博客园_首页
B
Blog
雷峰网
雷峰网
S
SegmentFault 最新的问题
N
Netflix TechBlog - Medium
D
Docker
博客园 - 司徒正美
博客园 - 【当耐特】
大猫的无限游戏
大猫的无限游戏
博客园 - Franky
MongoDB | Blog
MongoDB | Blog
U
Unit 42
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
腾讯CDC
F
Fortinet All Blogs
aimingoo的专栏
aimingoo的专栏
Martin Fowler
Martin Fowler
Jina AI
Jina AI
WordPress大学
WordPress大学
D
DataBreaches.Net
V
V2EX
V
Visual Studio Blog
Know Your Adversary
Know Your Adversary
P
Privacy & Cybersecurity Law Blog
F
Full Disclosure
G
Google Developers Blog
Engineering at Meta
Engineering at Meta
The Hacker News
The Hacker News
Security Archives - TechRepublic
Security Archives - TechRepublic
IT之家
IT之家
P
Privacy International News Feed

Cloud Native Computing Foundation

Kepler, re-architected: Improved power accuracy and a community call to action! Dragonfly v2.5.0 is released OTel and mesh-derived metrics: A 2026 reference etcd-operator joins Cozystack with a new v1alpha2 API Security Profiles Operator v1: Stable APIs, Security Hardened, and Shaping Upstream Kubernetes Securing CI/CD for an open source project, part 3: Credentials, verification, and what’s next From Awareness to Engineered Accessibility in Open Source Agent Auth: A lawyer’s day in court Building Jaeger’s ClickHouse backend: 8.6× compression on 10 million spans Telemetry that matters: Designing sustainable, high-impact observability pipelines KubeCon + CloudNativeCon, OpenInfra Summit and PyTorch Conference Unite in China to Scale AI Flipkart Wins CNCF End User Case Study Contest for Kubernetes and Chaos Engineering Scale Expanding CARE: Passing CKS can now extend your CKA certification CNCF and Linux Foundation Education Partner with Udemy to Provide a Unified Cloud Native Training & Certification Opportunity CNCF and SlashData Report Confirms India as One of the Largest Cloud Native Communities with 2.25 Million Developers CNCF Welcomes New Silver Members as Global Demand for Cloud Native Infrastructure Grows Why cloud native belongs at the heart of agentic AI: Lessons from building a multi-agent security platform on Kubernetes Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Building a cloud native internal developer platform with Kubernetes, GitOps, and supply chain security The Kubernetes integration tax: Prometheus, Cilium and production reality GPU autoscaling on Kubernetes with KEDA: Building an external scaler Three TAG leads walk into the TOC How Jaeger is evolving to trace AI agents with OpenTelemetry Why Kubernetes policy enforcement happens too late—and what to do about it Zero-Downtime migration from ingress NGINX to Envoy Gateway Client Challenge Client Challenge Client Challenge
Building a Cluster-Aware AI Agent with Kubernetes, Argo CD, and GitOps
audra · 2026-06-25 · via Cloud Native Computing Foundation

Posted on June 25, 2026 by Maryam Tavakkoli (CNCF Ambassador | Lead Cloud Engineer @ RELEX Solutions)

CNCF projects highlighted in this post

Argo logo Kubernetes logo

A practical walkthrough of running a self-hosted, read-only AI agent inside a Kubernetes cluster, with the full CI/CD chain handled by GitHub Actions and Argo CD Image Updater. No data leaves the cluster, no cloud AI provider involved.


Why a Cluster-Aware Agent Is an Interesting Pattern

Most “AI for Kubernetes” tooling today is a hosted SaaS that consumes cluster data and returns advice. The model lives elsewhere. The data leaves the network.

This article walks through the opposite design: an agent that runs inside the cluster, observes live state through the Kubernetes API, and reasons with a local LLM. Every layer is visible, every credential is scoped, and the only network egress is a model pull at startup.

The interesting properties of this pattern for platform engineers:

PropertyWhat it provides
Cluster-awareThe agent reads live pods, events, and logs and reasons about real state rather than generic Kubernetes facts.
Read-only by designA dedicated ServiceAccount + ClusterRole with get/list verbs only. The agent can observe but cannot mutate the cluster, regardless of what the model produces.
Just another K8s workloadThe agent is a Deployment + Service + PersistentVolumeClaim. No special runtime, no operator, no custom scheduler.
Full GitOpsPrompts, model selection, and RBAC live in Git. Argo CD reconciles them. The agent’s behavior is auditable through git log.

Source code: github.com/MaryamTavakkoli/local-k8s-ai-agent


LLM vs. AI Agent: The Distinction That Matters

A Large Language Model answers from training data alone. It has no awareness of the environment it’s deployed into. An agent, in the sense used here, performs an extra step before reasoning: it observes the real world and incorporates that observation into the prompt.

Graphic: LLM Alone vs. AI Agent

The contrast in output is concrete. A generic LLM call returns “CrashLoopBackOff usually means the container is failing health checks or exiting unexpectedly…” An agent call returns “Pod api-7b8d has restarted 14 times in the last hour with ImagePullBackOff against registry.local. Run kubectl describe pod api-7b8d to confirm.”

The second answer is grounded. The first answer is correct but not actionable for this cluster.

This project demonstrates both modes through two REST endpoints:

  • POST /ask — LLM alone, useful for general questions like “What is a StatefulSet?”
  • POST /diagnose — the agent: reads live cluster state, then reasons over it

Architecture

The system has two halves: a CI/CD chain on the top and a Kubernetes runtime on the bottom.

Runtime side:

  • An Ollama pod serves a local Mistral 7B model on port 11434
  • A FastAPI pod exposes the agent’s HTTP API and chat UI on port 8000
  • A PersistentVolumeClaim holds the model weights so pulls aren’t repeated
  • A dedicated ServiceAccount mounted in the FastAPI pod has a ClusterRole permitting only read operations on pods, events, logs, services, and deployments

Delivery side:

  • A push to the application source in Git triggers GitHub Actions to build a multi-architecture image (linux/amd64 + linux/arm64) tagged with the 7-character commit SHA
  • Argo CD Image Updater (from argoproj-labs) polls Docker Hub on a 2-minute interval, detects new tags matching the configured regex, and commits the new tag back into the repository’s kustomization.yaml
  • Argo CD detects the manifest change and reconciles the cluster
Graphic: Architecture: Local AI Agent on Kubernetes

The two halves are decoupled. Argo CD has no awareness of the registry. GitHub Actions has no awareness of the cluster. Image Updater is the small operator that bridges them, and it does so by writing to Git, which preserves a single source of truth.


The AI Concepts You’ll Actually Touch

Here are a few concepts that every AI engineer uses every day, explained in plain language.

1. LLM (Large Language Model)

A statistical model trained on enormous amounts of text. It doesn’t “know” facts; it predicts the most likely next word given everything that came before. That’s it. The magic is that this simple task, done at scale, produces something that feels like reasoning.

This project uses Mistral 7B, a 7-billion-parameter open-source model. “Parameters” are the numbers the model learned during training, similar to the strengths of connections in a brain.

2. Local LLM

Most commercial AI services send your text to a remote cloud provider. The trade-off is capability: a local 7B model isn’t as expansive as a massive foundational model running on cloud infrastructure.. But for experimenting, it’s more than enough. And nothing leaves your network.

3. Ollama (The Model Serving Runtime)

Ollama is not an AI model. It’s a server that runs AI models. Think of it like a web server for LLMs: it downloads the model files, loads them into memory, and exposes a REST API on port 11434 so anything (including our FastAPI app) can send prompts and get responses.

Without Ollama, you’d be wrestling with PyTorch, CUDA, and tokenizer libraries. With it, running an LLM is ollama pull mistral followed by an HTTP POST.

4. System Prompt (The Personality)

This is the single most important AI concept for application developers, and you can master it in about ten minutes.

A system prompt is the instructions you give the model before the user’s question. The model reads it first and uses it to shape every response.

In our project, the system prompt for /ask is:

“You are a DevOps assistant specializing in Kubernetes.

When given an error or question, you:

1. Explain what it means clearly

2. Provide the exact kubectl commands to diagnose or fix it

3. Explain why the fix works

Be concise and practical.”

Without that prompt, Mistral is a general assistant. With it, Mistral is a Kubernetes specialist who always returns structured answers. No retraining was needed. This is called prompt engineering, and it’s how almost every AI product you use was built.

5. RAG (Retrieval-Augmented Generation)

The fancy term for what /diagnose does. RAG means: before asking the model, retrieve real-world data and augment the prompt with it.

RAG is why contemporary AI assistants work. A code assistant reads your local workspace repository; our agent reads your live cluster state.. Same pattern, different data source.


The Two Modes: Where the Agent Becomes Real

Here’s where the “agent” idea earns its name.

Mode 1: Ask (LLM alone)

You type a question, FastAPI prepends the system prompt, sends it to Ollama. The model answers from its training data. Useful for general K8s questions like “What is a StatefulSet?”

Mode 2: Diagnose Cluster (true agent)

You type a question and a namespace. FastAPI does something new: it calls the Kubernetes API and reads:

All pods in that namespace (phase, restart count, waiting reason)

The last 10 events

The last 20 lines of logs from any non-Running pod

That entire context is injected into the prompt. Then Mistral reasons, but now it’s reasoning about your actual cluster, not generic Kubernetes knowledge.

Graphic: Local K8s AI Agent

The chat UI even shows you the exact context the agent read, in a collapsible panel under each answer.


Read-Only by Design

The agent runs with a ServiceAccount bound to a ClusterRole that exposes only read verbs:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: ai-devops-api-reader
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log", "events", "services", "configmaps", "namespaces"]
    verbs: ["get", "list"]
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets", "statefulsets", "daemonsets"]
    verbs: ["get", "list"]

This is the most important design decision in the project, and it generalizes beyond AI workloads. An agent that can delete pod based on its own reasoning is a production incident waiting to happen. Hallucinations multiplied by write access is a poor combination.

Read-only RBAC inverts the trust model. The agent is allowed to be wrong because being wrong has no consequences. The Kubernetes API server enforces the boundary; the LLM’s output cannot bypass it. Iteration on prompts and models becomes cheap because the worst-case behavior is bounded.

The same pattern scales: start every agent read-only, then earn each additional capability one verb at a time, each with its own RBAC rule and review.


The CI/CD Chain in Detail

The delivery half of the architecture uses three independent components, each with one responsibility.

GitHub Actions builds and pushes the image. The workflow uses docker buildx with QEMU emulation to produce a manifest list covering both linux/amd64 (GitHub-hosted runners) and linux/arm64 (Apple Silicon developer machines). The tag is the 7-character commit SHA, an immutable reference.

Argo CD Image Updater polls the registry on a 2-minute interval. Configuration lives in an ImageUpdater Custom Resource that names the target Argo CD Application, the image to track, an allowTags regex (^[0-9a-f]{7}$), and the update strategy (newest-build). When a new matching tag is found, the operator rewrites the newTag field in k8s/kustomization.yaml and commits the change to the main branch.

apiVersion: argocd-image-updater.argoproj.io/v1alpha1
kind: ImageUpdater
metadata:
  name: local-k8s-ai-agent
  namespace: argocd
spec:
  writeBackConfig:
    method: git
    gitConfig:
      branch: main
      writeBackTarget: "kustomization:."
  applicationRefs:
    - namePattern: "local-k8s-ai-agent"
      images:
        - alias: api
          imageName: marytvk/local-k8s-ai-agent
          commonUpdateSettings:
            updateStrategy: newest-build
            allowTags: "regexp:^[0-9a-f]{7}$"

Argo CD watches the repository and reconciles the cluster on each commit. Because the source manifests are managed by Kustomize, Argo CD applies the rendered output, which now includes the updated image tag.


Try It Yourself: It’s a Starting Point, Not a Destination

The repo is here: github.com/MaryamTavakkoli/local-k8s-ai-agent

Step-by-step setup with exact commands is in the README. Total time from git clone to working chat UI is about 30 minutes (most of that is the Mistral download).

To bring this article full circle: if you’re a DevOps or Platform Engineer who’s been hearing “AI agents are coming” and wondering what that actually means in practice, this is meant to be your starting point, not your finish line. Once you’ve seen the agent loop running, you’ll be in a much better position to continue.

The point of starting local isn’t that local is always the right answer. It’s to understand how the full circle works behind the scenes.