惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News | PayPal Newsroom
Security Archives - TechRepublic
Security Archives - TechRepublic
Hacker News: Ask HN
Hacker News: Ask HN
H
Hacker News: Front Page
Apple Machine Learning Research
Apple Machine Learning Research
TaoSecurity Blog
TaoSecurity Blog
Help Net Security
Help Net Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
V
V2EX
Hugging Face - Blog
Hugging Face - Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
人人都是产品经理
人人都是产品经理
博客园 - 三生石上(FineUI控件)
Security Latest
Security Latest
Cloudbric
Cloudbric
WordPress大学
WordPress大学
S
SegmentFault 最新的问题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Know Your Adversary
Know Your Adversary
A
Arctic Wolf
L
LangChain Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
The GitHub Blog
The GitHub Blog
P
Proofpoint News Feed
W
WeLiveSecurity
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
M
MIT News - Artificial intelligence
Google DeepMind News
Google DeepMind News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
The Cloudflare Blog
小众软件
小众软件
NISL@THU
NISL@THU
云风的 BLOG
云风的 BLOG
P
Privacy & Cybersecurity Law Blog
S
Security @ Cisco Blogs
博客园 - 【当耐特】
I
InfoQ
Vercel News
Vercel News
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
P
Proofpoint News Feed
O
OpenAI News
Google DeepMind News
Google DeepMind News
N
News and Events Feed by Topic
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
K
Kaspersky official blog
T
Threat Research - Cisco Blogs
量子位
宝玉的分享
宝玉的分享

Cloud Native Computing Foundation

Kepler, re-architected: Improved power accuracy and a community call to action! Dragonfly v2.5.0 is released OTel and mesh-derived metrics: A 2026 reference etcd-operator joins Cozystack with a new v1alpha2 API Security Profiles Operator v1: Stable APIs, Security Hardened, and Shaping Upstream Kubernetes Securing CI/CD for an open source project, part 3: Credentials, verification, and what’s next From Awareness to Engineered Accessibility in Open Source Agent Auth: A lawyer’s day in court Building Jaeger’s ClickHouse backend: 8.6× compression on 10 million spans Telemetry that matters: Designing sustainable, high-impact observability pipelines KubeCon + CloudNativeCon, OpenInfra Summit and PyTorch Conference Unite in China to Scale AI Flipkart Wins CNCF End User Case Study Contest for Kubernetes and Chaos Engineering Scale Expanding CARE: Passing CKS can now extend your CKA certification CNCF and Linux Foundation Education Partner with Udemy to Provide a Unified Cloud Native Training & Certification Opportunity CNCF and SlashData Report Confirms India as One of the Largest Cloud Native Communities with 2.25 Million Developers CNCF Welcomes New Silver Members as Global Demand for Cloud Native Infrastructure Grows Why cloud native belongs at the heart of agentic AI: Lessons from building a multi-agent security platform on Kubernetes Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Client Challenge Building a cloud native internal developer platform with Kubernetes, GitOps, and supply chain security The Kubernetes integration tax: Prometheus, Cilium and production reality GPU autoscaling on Kubernetes with KEDA: Building an external scaler Three TAG leads walk into the TOC How Jaeger is evolving to trace AI agents with OpenTelemetry Why Kubernetes policy enforcement happens too late—and what to do about it Zero-Downtime migration from ingress NGINX to Envoy Gateway Client Challenge Client Challenge Client Challenge
Building a Cluster-Aware AI Agent with Kubernetes, Argo CD, and GitOps
audra · 2026-06-25 · via Cloud Native Computing Foundation

Posted on June 25, 2026 by Maryam Tavakkoli (CNCF Ambassador | Lead Cloud Engineer @ RELEX Solutions)

CNCF projects highlighted in this post

Argo logo Kubernetes logo

A practical walkthrough of running a self-hosted, read-only AI agent inside a Kubernetes cluster, with the full CI/CD chain handled by GitHub Actions and Argo CD Image Updater. No data leaves the cluster, no cloud AI provider involved.


Why a Cluster-Aware Agent Is an Interesting Pattern

Most “AI for Kubernetes” tooling today is a hosted SaaS that consumes cluster data and returns advice. The model lives elsewhere. The data leaves the network.

This article walks through the opposite design: an agent that runs inside the cluster, observes live state through the Kubernetes API, and reasons with a local LLM. Every layer is visible, every credential is scoped, and the only network egress is a model pull at startup.

The interesting properties of this pattern for platform engineers:

PropertyWhat it provides
Cluster-awareThe agent reads live pods, events, and logs and reasons about real state rather than generic Kubernetes facts.
Read-only by designA dedicated ServiceAccount + ClusterRole with get/list verbs only. The agent can observe but cannot mutate the cluster, regardless of what the model produces.
Just another K8s workloadThe agent is a Deployment + Service + PersistentVolumeClaim. No special runtime, no operator, no custom scheduler.
Full GitOpsPrompts, model selection, and RBAC live in Git. Argo CD reconciles them. The agent’s behavior is auditable through git log.

Source code: github.com/MaryamTavakkoli/local-k8s-ai-agent


LLM vs. AI Agent: The Distinction That Matters

A Large Language Model answers from training data alone. It has no awareness of the environment it’s deployed into. An agent, in the sense used here, performs an extra step before reasoning: it observes the real world and incorporates that observation into the prompt.

Graphic: LLM Alone vs. AI Agent

The contrast in output is concrete. A generic LLM call returns “CrashLoopBackOff usually means the container is failing health checks or exiting unexpectedly…” An agent call returns “Pod api-7b8d has restarted 14 times in the last hour with ImagePullBackOff against registry.local. Run kubectl describe pod api-7b8d to confirm.”

The second answer is grounded. The first answer is correct but not actionable for this cluster.

This project demonstrates both modes through two REST endpoints:

  • POST /ask — LLM alone, useful for general questions like “What is a StatefulSet?”
  • POST /diagnose — the agent: reads live cluster state, then reasons over it

Architecture

The system has two halves: a CI/CD chain on the top and a Kubernetes runtime on the bottom.

Runtime side:

  • An Ollama pod serves a local Mistral 7B model on port 11434
  • A FastAPI pod exposes the agent’s HTTP API and chat UI on port 8000
  • A PersistentVolumeClaim holds the model weights so pulls aren’t repeated
  • A dedicated ServiceAccount mounted in the FastAPI pod has a ClusterRole permitting only read operations on pods, events, logs, services, and deployments

Delivery side:

  • A push to the application source in Git triggers GitHub Actions to build a multi-architecture image (linux/amd64 + linux/arm64) tagged with the 7-character commit SHA
  • Argo CD Image Updater (from argoproj-labs) polls Docker Hub on a 2-minute interval, detects new tags matching the configured regex, and commits the new tag back into the repository’s kustomization.yaml
  • Argo CD detects the manifest change and reconciles the cluster
Graphic: Architecture: Local AI Agent on Kubernetes

The two halves are decoupled. Argo CD has no awareness of the registry. GitHub Actions has no awareness of the cluster. Image Updater is the small operator that bridges them, and it does so by writing to Git, which preserves a single source of truth.


The AI Concepts You’ll Actually Touch

Here are a few concepts that every AI engineer uses every day, explained in plain language.

1. LLM (Large Language Model)

A statistical model trained on enormous amounts of text. It doesn’t “know” facts; it predicts the most likely next word given everything that came before. That’s it. The magic is that this simple task, done at scale, produces something that feels like reasoning.

This project uses Mistral 7B, a 7-billion-parameter open-source model. “Parameters” are the numbers the model learned during training, similar to the strengths of connections in a brain.

2. Local LLM

Most commercial AI services send your text to a remote cloud provider. The trade-off is capability: a local 7B model isn’t as expansive as a massive foundational model running on cloud infrastructure.. But for experimenting, it’s more than enough. And nothing leaves your network.

3. Ollama (The Model Serving Runtime)

Ollama is not an AI model. It’s a server that runs AI models. Think of it like a web server for LLMs: it downloads the model files, loads them into memory, and exposes a REST API on port 11434 so anything (including our FastAPI app) can send prompts and get responses.

Without Ollama, you’d be wrestling with PyTorch, CUDA, and tokenizer libraries. With it, running an LLM is ollama pull mistral followed by an HTTP POST.

4. System Prompt (The Personality)

This is the single most important AI concept for application developers, and you can master it in about ten minutes.

A system prompt is the instructions you give the model before the user’s question. The model reads it first and uses it to shape every response.

In our project, the system prompt for /ask is:

“You are a DevOps assistant specializing in Kubernetes.

When given an error or question, you:

1. Explain what it means clearly

2. Provide the exact kubectl commands to diagnose or fix it

3. Explain why the fix works

Be concise and practical.”

Without that prompt, Mistral is a general assistant. With it, Mistral is a Kubernetes specialist who always returns structured answers. No retraining was needed. This is called prompt engineering, and it’s how almost every AI product you use was built.

5. RAG (Retrieval-Augmented Generation)

The fancy term for what /diagnose does. RAG means: before asking the model, retrieve real-world data and augment the prompt with it.

RAG is why contemporary AI assistants work. A code assistant reads your local workspace repository; our agent reads your live cluster state.. Same pattern, different data source.


The Two Modes: Where the Agent Becomes Real

Here’s where the “agent” idea earns its name.

Mode 1: Ask (LLM alone)

You type a question, FastAPI prepends the system prompt, sends it to Ollama. The model answers from its training data. Useful for general K8s questions like “What is a StatefulSet?”

Mode 2: Diagnose Cluster (true agent)

You type a question and a namespace. FastAPI does something new: it calls the Kubernetes API and reads:

All pods in that namespace (phase, restart count, waiting reason)

The last 10 events

The last 20 lines of logs from any non-Running pod

That entire context is injected into the prompt. Then Mistral reasons, but now it’s reasoning about your actual cluster, not generic Kubernetes knowledge.

Graphic: Local K8s AI Agent

The chat UI even shows you the exact context the agent read, in a collapsible panel under each answer.


Read-Only by Design

The agent runs with a ServiceAccount bound to a ClusterRole that exposes only read verbs:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: ai-devops-api-reader
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log", "events", "services", "configmaps", "namespaces"]
    verbs: ["get", "list"]
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets", "statefulsets", "daemonsets"]
    verbs: ["get", "list"]

This is the most important design decision in the project, and it generalizes beyond AI workloads. An agent that can delete pod based on its own reasoning is a production incident waiting to happen. Hallucinations multiplied by write access is a poor combination.

Read-only RBAC inverts the trust model. The agent is allowed to be wrong because being wrong has no consequences. The Kubernetes API server enforces the boundary; the LLM’s output cannot bypass it. Iteration on prompts and models becomes cheap because the worst-case behavior is bounded.

The same pattern scales: start every agent read-only, then earn each additional capability one verb at a time, each with its own RBAC rule and review.


The CI/CD Chain in Detail

The delivery half of the architecture uses three independent components, each with one responsibility.

GitHub Actions builds and pushes the image. The workflow uses docker buildx with QEMU emulation to produce a manifest list covering both linux/amd64 (GitHub-hosted runners) and linux/arm64 (Apple Silicon developer machines). The tag is the 7-character commit SHA, an immutable reference.

Argo CD Image Updater polls the registry on a 2-minute interval. Configuration lives in an ImageUpdater Custom Resource that names the target Argo CD Application, the image to track, an allowTags regex (^[0-9a-f]{7}$), and the update strategy (newest-build). When a new matching tag is found, the operator rewrites the newTag field in k8s/kustomization.yaml and commits the change to the main branch.

apiVersion: argocd-image-updater.argoproj.io/v1alpha1
kind: ImageUpdater
metadata:
  name: local-k8s-ai-agent
  namespace: argocd
spec:
  writeBackConfig:
    method: git
    gitConfig:
      branch: main
      writeBackTarget: "kustomization:."
  applicationRefs:
    - namePattern: "local-k8s-ai-agent"
      images:
        - alias: api
          imageName: marytvk/local-k8s-ai-agent
          commonUpdateSettings:
            updateStrategy: newest-build
            allowTags: "regexp:^[0-9a-f]{7}$"

Argo CD watches the repository and reconciles the cluster on each commit. Because the source manifests are managed by Kustomize, Argo CD applies the rendered output, which now includes the updated image tag.


Try It Yourself: It’s a Starting Point, Not a Destination

The repo is here: github.com/MaryamTavakkoli/local-k8s-ai-agent

Step-by-step setup with exact commands is in the README. Total time from git clone to working chat UI is about 30 minutes (most of that is the Mistral download).

To bring this article full circle: if you’re a DevOps or Platform Engineer who’s been hearing “AI agents are coming” and wondering what that actually means in practice, this is meant to be your starting point, not your finish line. Once you’ve seen the agent loop running, you’ll be in a much better position to continue.

The point of starting local isn’t that local is always the right answer. It’s to understand how the full circle works behind the scenes.