惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LangChain Blog
宝玉的分享
宝玉的分享
酷 壳 – CoolShell
酷 壳 – CoolShell
N
Netflix TechBlog - Medium
F
Fortinet All Blogs
T
Tailwind CSS Blog
Google DeepMind News
Google DeepMind News
Jina AI
Jina AI
J
Java Code Geeks
Recent Announcements
Recent Announcements
The Cloudflare Blog
D
DataBreaches.Net
Hugging Face - Blog
Hugging Face - Blog
WordPress大学
WordPress大学
Vercel News
Vercel News
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Microsoft Azure Blog
Microsoft Azure Blog
雷峰网
雷峰网
H
Help Net Security
博客园 - Franky
S
SegmentFault 最新的问题
T
The Blog of Author Tim Ferriss
博客园_首页
C
Check Point Blog
腾讯CDC
美团技术团队
Martin Fowler
Martin Fowler
The GitHub Blog
The GitHub Blog
M
MIT News - Artificial intelligence
Apple Machine Learning Research
Apple Machine Learning Research
P
Proofpoint News Feed
U
Unit 42
人人都是产品经理
人人都是产品经理
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Engineering at Meta
Engineering at Meta
M
Microsoft Research Blog - Microsoft Research
阮一峰的网络日志
阮一峰的网络日志
G
Google Developers Blog
Stack Overflow Blog
Stack Overflow Blog
B
Blog
Last Week in AI
Last Week in AI
博客园 - 三生石上(FineUI控件)
博客园 - 聂微东
云风的 BLOG
云风的 BLOG
H
Hackread – Cybersecurity News, Data Breaches, AI and More
李成银的技术随笔
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知

DEV Community

Local RAG: Chat With Your Documents (Open Source, Private) GGUF & Modelfile: The Power User's Guide to Local LLMs What Excited Me Most at Google I/O 2026 OSS assemble! Kilo Code is launching on Product Hunt. Join the launch! https://www.producthunt.com/products/kilocode Your Organizational AI Adoption Metrics Are Lying (Plus How to Measure Real Adoption) The Moment I Realized AI Agents are Changing Software Forever Prisma Generator NestJS DTO — pluggable DTOs with annotations and custom generators I Spent a Month Testing Decentralized Poker Sites. Here's What Actually Works. DeepSeek-R1: The $0 o1 Alternative You Can Run Right Now The PHP Stack I Built TrustGate On — And Why I'd Do It Differently Today Building High-Throughput Data Pipelines: Why Chaining Encryption and Compression is a Performance Killer Optic is dead. A 2026 migration guide for OpenAPI breaking changes Smart Blind Stick, Mini Project The NSA just published an MCP security playbook. We created Agent Trust Transport Protocol ATTP - Implement today with MCPS Symfony 8 AWS Secrets Bundle Canlı TV Platformu Geliştirirken Öğrendiğim Teknik Dersler: Streaming, Flussonic ve Performans Gemma 4 Is Powerful — But Production AI Still Needs Governance What RepoSignal Surfaced in React — and Why Review Alone Doesn't Catch Everything LeetCode Solution: 1752. Check if Array Is Sorted and Rotated Breaking the Matrix at 15: How I Built a Cyber-Aesthetic AI Assistant Core Powered by Gemma 4 Разработка Android Kiosk приложения No More Manual Test Writing: How I Used Gemma 4 to Turn a GitHub Repo Into a Full Test Suite 🎯 Trafik Cezaları Platformları Geliştirirken Öğrendiğim Teknik Dersler The Myth of Low Latency: Why Event Meshes Make Your System Slow Building EIDOLON OS — A Local-First AI Cognitive Operating System qrrot - database with AI I Built a Local Gemma 4 Reviewer for Merchant Registry Evidence Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift How to build your first MCP server in 10 minutes Expo SDK 56 Is Out, and a Few Things Finally Clicked Into Place Building a 100ms Browser-Native WebSocket Clipboard Cómo solucionar `docker run` con `Exited (1)` en Raspberry Pi Why Claude Code Sessions Diverge: A Mechanism Catalog When One AI Agent Is Not Enough: A Practical Delegation Pattern for Enterprise Systems Cómo solucionar el bucle infinito en `useEffect` con objetos y arrays 🛢️ The Dangote Chain: What a Blockchain-Native Refinery IPO Would Look Like Build a "Where to Watch" feature in 50 lines with the StreamWatchHub API Gemma 4 on Android: Tricks for Faster On-Device Inference Your AI agent has amnesia. You've just normalized it. 🚀 Reviving My Women Safety System – From Idea to Real-Time Smart Safety Solution I built an AI that reviews every PR automatically (because nobody was reviewing mine) 🌿 Git Mastery: The Complete Developer Guide Bringing Gemma 4 E2B to the Edge: Building a Privacy-First Dream Analyzer with Flutter & LiteRT Google I/O 2026 Wasn’t About Features — It Was About AI Becoming the Developer Environment Building an AI Vedic Astrology App in 25 Days — What Actually Worked (and What Didn't) Hermes Agent Has Four Memories — And That's Why It Doesn't Forget You Pressure Isn't Killing You -Your Relationship With It Is 🐳 How to Run Any Project in Docker: A Complete Guide AccessLens — a blind person's lanyard, powered by Gemma 4 on-device Glyph v0.2: the release is the joinery How I Built a Blazingly Fast, Privacy-First Batch Image Converter in the Browser Using OPFS and Web Workers Cómo solucionar \"Text content does not match server-rendered HTML\" en Next.js App Router FCoP 3.0: Why AI Agents Need a Track, Not a Brake Fibonacci: Quiz app which anyone can make revenue by viewing ads to the quiz contestants. The Subconscious Powered by Edge AI GPU Utilization Is Becoming the New Cloud Waste Crisis Cómo solucionar `docker run` con exit code 1 en Raspberry Pi JWT is a scam and your app doesn't need it 7 Agent Skill Packs That Actually Make AI Coders Better More Control, More Cost: Why Commanding AI Isn't Delegation SecureScan Synthadoc: We Built an AI Judge for Our AI Wiki Compiler - Here's What We Learned Cómo solucionar el error de permiso al ejecutar `pip.exe` en entorno virtual (Python 3.10 en Windows) Postgres-grade Serializable at 20k+ ops/s — on a laptop. Don’t try this at home. Pure Core, Imperative Shell in Rust with Stillwater Lean 4 for Programmers: Building a Todo List with Proof Trustless Bug Bounty Releases with a PoW-Gated DLC Oracle Building Autonomous DevOps Agents with MCP and LangChain Multimodal Gemma 4 Visual Regression & Patch Agent Git Time Machine — How Version Control Can Save Your Project My Dad Got an Electricity Bill He Couldn't Understand. Google I/O 2026 Just Made That Problem Solvable. My Dad Got an Electricity Bill He Couldn't Understand. Google I/O 2026 Just Made That Problem Solvable. Read Replicas Lie About Consistency. 4 Sync Modes Behind the Lie. Reviving My Coding Project with GitHub Copilot I Tried Gemini 3.5 Flash After Google I/O 2026 - Here is What I Found :)) Zero-Cost AI in VS Code Blueprints Might Be More Important Than Frameworks AI CareCompanion - Offline Health Assistant Long-Context Models Killed RAG. Except for the 6 Cases Where They Made It Worse. I Built a Neural Network Engine in C# That Runs in Your Browser - No ONNX Runtime, No JavaScript Bridge, No Native Binaries An In-Depth Overview of the Apache Iceberg 1.11.0 Release Your Agent Just Called the Same Tool 47 Times. Here's the 20-Line Detector. How I Built a Multi-System Astrology Bot in Python (And What Meta Banned Me For) Gemma 4 Has Four Variants. Here's How to Pick the Right One Before You Write a Single Line of Code. Log Level Strategies: Balancing Observability and Cost Why WebMCP Is the Most Important Thing Google Announced at I/O 2026 (And Nobody's Talking About It) Making LLM Calls Reliable: Retry, Semaphore, Cache, and Batch Google's 2x Energy Efficiency Claim Is Real — But Here's What They're Not Measuring What's actually going on with CORS, under the hood Language-Agnostic Code Generation: The Driver Plugin Model Why We Rewrote Our Python CLI in Go (and What We Gained) I added up everything Google gives developers for free after I/O 2026. It's kind of absurd The Dawn of Smarter Apps: My Take on Google I/O 2026 AI Announcements Why AI Agents Like Hermes Need a Semantic Execution Layer for the Physical World Why We Built TestSmith: The Test Coverage Problem Nobody Talks About How to Convert Bank Statement PDFs to Excel: The Complete 2026 Guide Have You Ever Used a Website That Keeps Working After You Turn Off Your Internet? From idea to indexed: how I launched a SaaS in 60 days with Laravel + React Building a local-first AI tutor for my daughter (and 10–14 year-olds in Austrian schools) with Gemma 4 EC2 SSH Not Connecting? Here Are the 5 Things That Were Wrong (And How I Fixed Them)
Building a Production-Grade MLOps Home Lab on Windows — K8s, LLM, RAG & GitLab CI
Slimane BOUH · 2026-05-24 · via DEV Community

TL;DR — I set up a complete MLOps stack on my Windows 11 PC using Multipass + k3s. This is the real guide — including every error I hit and how I fixed it. No fluff, no perfect screenshots. Just what actually worked.


Why Build a Home Lab?

Cloud bills add up fast when you're learning. A local home lab gives you:

  • Real Kubernetes — not Minikube toy mode
  • Full MLOps stack — MLflow, Minio, Airflow, Ollama, Qdrant
  • CI/CD with GitLab — actual pipelines, not tutorials
  • Zero cost — runs on hardware you already own
  • Safe sandbox — break things without consequences

The goal wasn't just to have services running. The goal was to practice the full DevOps + MLOps workflow end to end: push code → pipeline triggers → Terraform provisions → services deploy → metrics appear in Grafana.


My Setup

Resource Value
OS Windows 11 Pro
RAM 32 GB
CPU 8 cores
Disk 500 GB SSD
Hypervisor Hyper-V (native Windows Pro)
VM Manager Multipass
Kubernetes k3s

Architecture decision: I kept Windows as my daily driver and ran everything inside a single Ubuntu VM via Multipass. Clean separation, easy to pause/resume, no dual boot headaches.


The Stack

Windows 11 (daily driver)
│
├── 🌐 GitLab.com (SaaS — free tier)
│    └── Pipelines + Container Registry
│
└── Multipass → vm-k3s (10 GB RAM / 4 CPU / 80 GB)
     │
     ├── ☸️  k3s (Kubernetes)
     │
     ├── ⚙️  MLOps
     │    ├── MLflow      — experiment tracking
     │    ├── Minio       — S3-compatible artifact storage
     │    └── Airflow     — pipeline orchestration
     │
     ├── 🤖 LLM Stack
     │    ├── Ollama      — run LLMs locally (CPU)
     │    └── LiteLLM    — unified OpenAI-compatible API
     │
     ├── 🔍 RAG Stack
     │    ├── Qdrant      — vector database
     │    └── LangChain   — RAG orchestration
     │
     ├── 📊 Observability
     │    ├── Prometheus  — metrics
     │    ├── Grafana     — dashboards
     │    └── Loki        — centralized logs
     │
     └── 🔐 HashiCorp Vault — secrets management

Enter fullscreen mode Exit fullscreen mode


Step 1 — Enable Hyper-V and Install Tools

First, enable Hyper-V on Windows Pro (required for Multipass):

# Run as Administrator
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V-All
# Reboot when prompted

Enter fullscreen mode Exit fullscreen mode

After reboot, install all tools via winget:

winget install Canonical.Multipass
winget install Git.Git
winget install Microsoft.VisualStudioCode
winget install Hashicorp.Terraform
winget install Helm.Helm
winget install Kubernetes.kubectl

Enter fullscreen mode Exit fullscreen mode

Why winget over Chocolatey? I originally used choco install multipass and hit this error:
Exception calling "Start": "The specified executable is not a valid application for this OS platform."
Winget installs the proper signed installer directly. Use winget.


Step 2 — Create the VM

multipass launch `
  --name vm-k3s `
  --cpus 4 `
  --memory 10G `
  --disk 80G `
  22.04

Enter fullscreen mode Exit fullscreen mode

RAM tip: I originally tried 16 GB and got:
Failed to allocate 16384 MB of RAM: Insufficient system resources
Windows was already consuming ~20 GB. 10 GB for the VM is the sweet spot on a 32 GB machine — leaves your OS comfortable and gives k3s plenty of room.

Check your actual free RAM before creating the VM:

Get-CimInstance Win32_OperatingSystem | Select-Object FreePhysicalMemory, TotalVisibleMemorySize

Enter fullscreen mode Exit fullscreen mode

Enter the VM:

multipass shell vm-k3s
# Prompt becomes: ubuntu@vm-k3s:~$

Enter fullscreen mode Exit fullscreen mode


Step 3 — Install Docker + k3s

Inside the VM:

# Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker ubuntu
sudo systemctl enable docker && sudo systemctl start docker

# k3s — lightweight Kubernetes
curl -sfL https://get.k3s.io | sh -s - \
  --write-kubeconfig-mode 644 \
  --disable traefik \
  --docker

sleep 20
sudo k3s kubectl get nodes

Enter fullscreen mode Exit fullscreen mode

Configure kubectl:

mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown ubuntu:ubuntu ~/.kube/config
echo 'export KUBECONFIG=~/.kube/config' >> ~/.bashrc
source ~/.bashrc

kubectl get nodes
# NAME      STATUS   ROLES                  AGE   VERSION
# vm-k3s    Ready    control-plane,master   30s   v1.29.x

Enter fullscreen mode Exit fullscreen mode


Step 4 — Helm + Namespaces

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Add repos
helm repo add grafana         https://grafana.github.io/helm-charts
helm repo add prometheus      https://prometheus-community.github.io/helm-charts
helm repo add open-telemetry  https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo add hashicorp       https://helm.releases.hashicorp.com
helm repo update

# Create namespaces
for ns in mlops llm rag monitoring logging vault; do
  kubectl create namespace $ns
done

Enter fullscreen mode Exit fullscreen mode


Step 5 — Connect GitLab CI

I used GitLab.com SaaS instead of self-hosting GitLab. This saved 6 GB of RAM — GitLab CE alone needs 6+ GB. Free tier is more than enough for a home lab.

Create a project on gitlab.com, grab the registration token from Settings → CI/CD → Runners, then:

# Install GitLab Runner
curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh | sudo bash
sudo apt-get install -y gitlab-runner
sudo usermod -aG docker gitlab-runner

# Register
sudo gitlab-runner register \
  --non-interactive \
  --url "https://gitlab.com" \
  --registration-token "YOUR_TOKEN_HERE" \
  --executor "docker" \
  --docker-image "alpine:latest" \
  --docker-volumes "/var/run/docker.sock:/var/run/docker.sock" \
  --docker-privileged \
  --description "homelab-runner" \
  --tag-list "homelab,k8s,mlops,terraform" \
  --run-untagged true

sudo gitlab-runner start

Enter fullscreen mode Exit fullscreen mode

Your runner appears green in GitLab within seconds. Every push now triggers real CI/CD on your local machine.


Step 6 — Deploy Minio (The Right Way)

This is where I hit my first major blocker.

What I tried first:

helm install minio bitnami/minio \
  --namespace mlops \
  --set auth.rootUser=minioadmin \
  --set auth.rootPassword=minioadmin123

Enter fullscreen mode Exit fullscreen mode

What happened:

Failed to pull image "docker.io/bitnami/minio:2025.7.23-debian-12-r3": not found
Error: ErrImagePull → ImagePullBackOff

Enter fullscreen mode Exit fullscreen mode

Bitnami generates Helm chart tags that reference Docker images which don't yet exist on Docker Hub. Classic timing issue.

The fix — use the official Minio image directly:

cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: minio
  namespace: mlops
spec:
  replicas: 1
  selector:
    matchLabels:
      app: minio
  template:
    metadata:
      labels:
        app: minio
    spec:
      containers:
      - name: minio
        image: quay.io/minio/minio:latest
        command: ["minio", "server", "/data", "--console-address", ":9001"]
        env:
        - name: MINIO_ROOT_USER
          value: "minioadmin"
        - name: MINIO_ROOT_PASSWORD
          value: "minioadmin123"
        ports:
        - containerPort: 9000
          name: api
        - containerPort: 9001
          name: console
---
apiVersion: v1
kind: Service
metadata:
  name: minio
  namespace: mlops
spec:
  type: NodePort
  ports:
  - name: api
    port: 9000
    nodePort: 30900
  - name: console
    port: 9001
    nodePort: 30901
  selector:
    app: minio
EOF

Enter fullscreen mode Exit fullscreen mode

quay.io/minio/minio is Minio's own registry — always up to date, no tag mismatch issues.


Step 7 — Deploy MLflow

MLflow needs Minio as its artifact backend. I used SQLite to keep it simple (no PostgreSQL dependency for a home lab):

cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow
  namespace: mlops
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow
  template:
    metadata:
      labels:
        app: mlflow
    spec:
      initContainers:
      - name: create-minio-bucket
        image: quay.io/minio/mc:latest
        command: ["/bin/sh", "-c"]
        args:
          - |
            mc alias set minio http://minio:9000 minioadmin minioadmin123
            mc mb minio/mlflow --ignore-existing
      containers:
      - name: mlflow
        image: ghcr.io/mlflow/mlflow:latest
        command:
          - mlflow
          - server
          - --host=0.0.0.0
          - --port=5000
          - --backend-store-uri=sqlite:///mlflow.db
          - --default-artifact-root=s3://mlflow/
          - --serve-artifacts
        env:
        - name: MLFLOW_S3_ENDPOINT_URL
          value: "http://minio:9000"
        - name: AWS_ACCESS_KEY_ID
          value: "minioadmin"
        - name: AWS_SECRET_ACCESS_KEY
          value: "minioadmin123"
        - name: AWS_DEFAULT_REGION
          value: "us-east-1"
        ports:
        - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: mlflow
  namespace: mlops
spec:
  type: NodePort
  ports:
  - port: 5000
    nodePort: 30500
  selector:
    app: mlflow
EOF

Enter fullscreen mode Exit fullscreen mode

The initContainer automatically creates the mlflow bucket in Minio before the server starts — no manual setup needed.


Step 8 — Run a Local LLM with Ollama

This is where it gets interesting. Running a real LLM on your local machine, inside Kubernetes, on CPU only.

cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: llm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        env:
        - name: OLLAMA_NUM_PARALLEL
          value: "1"
        - name: OLLAMA_MAX_LOADED_MODELS
          value: "1"
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "3"
        volumeMounts:
        - name: ollama-data
          mountPath: /root/.ollama
      volumes:
      - name: ollama-data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: llm
spec:
  type: NodePort
  ports:
  - port: 11434
    nodePort: 31434
  selector:
    app: ollama
EOF

Enter fullscreen mode Exit fullscreen mode

Critical: always set resource limits on shared VMs. Without limits, Ollama will consume all available RAM and OOMKill your other pods. I learned this the hard way.

Choosing the Right Model for CPU

Model RAM needed Good for
Mistral 7B Q4 4.3 GB Too heavy for 4 GB limit
Phi-3 Mini 3.5 GB Still too heavy
llama3.2:1b 1.3 GB ✅ Perfect for CPU home lab
gemma2:2b 1.6 GB ✅ Good alternative
OLLAMA_POD=$(kubectl get pod -n llm -l app=ollama -o jsonpath='{.items[0].metadata.name}')

# Pull the model
kubectl exec -n llm $OLLAMA_POD -- ollama pull llama3.2:1b

# Test it
kubectl exec -n llm $OLLAMA_POD -- ollama run llama3.2:1b "Explain RAG in 2 sentences"

Enter fullscreen mode Exit fullscreen mode

Test via API:

VM_IP=$(hostname -I | awk '{print $1}')
curl -s http://$VM_IP:31434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:1b",
    "prompt": "What is MLOps?",
    "stream": false
  }' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"

Enter fullscreen mode Exit fullscreen mode


Step 9 — Observability Stack

# Prometheus + Grafana
helm install kube-prometheus prometheus/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.service.type=NodePort \
  --set grafana.service.nodePort=30300 \
  --set grafana.adminPassword=admin123 \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

# Loki (log aggregation) + Promtail (log shipper)
helm install loki grafana/loki-stack \
  --namespace logging \
  --set grafana.enabled=false \
  --set promtail.enabled=true

Enter fullscreen mode Exit fullscreen mode

Access Grafana:

VM_IP=$(hostname -I | awk '{print $1}')
echo "Grafana: http://$VM_IP:30300"
# Login: admin / admin123

Enter fullscreen mode Exit fullscreen mode

Add Loki as a data source in Grafana:

  • Settings → Data Sources → Add → Loki
  • URL: http://loki.logging:3100

Now you have unified logs + metrics in one dashboard.


Step 10 — Vault for Secrets Management

helm install vault hashicorp/vault \
  --namespace vault \
  --set server.dev.enabled=true \
  --set server.dev.devRootToken=root \
  --set ui.enabled=true \
  --set ui.serviceType=NodePort \
  --set ui.serviceNodePort=30820

# Store your first secret
VAULT_POD=$(kubectl get pod -n vault -l app.kubernetes.io/name=vault -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n vault $VAULT_POD -- vault secrets enable kv-v2
kubectl exec -n vault $VAULT_POD -- vault kv put secret/homelab \
  minio_key=minioadmin \
  minio_secret=minioadmin123

VM_IP=$(hostname -I | awk '{print $1}')
echo "Vault UI: http://$VM_IP:30820  (token: root)"

Enter fullscreen mode Exit fullscreen mode

In your GitLab CI pipeline, reference Vault secrets instead of hardcoding them in variables.


The Full Picture — All Services Running

VM_IP=$(hostname -I | awk '{print $1}')
echo "=== Your Home Lab ==="
echo "MLflow   : http://$VM_IP:30500"
echo "Minio    : http://$VM_IP:30901"
echo "Grafana  : http://$VM_IP:30300  (admin/admin123)"
echo "Vault    : http://$VM_IP:30820  (token: root)"
echo "Ollama   : http://$VM_IP:31434"

Enter fullscreen mode Exit fullscreen mode


Lessons Learned

1. Bitnami Helm charts break on image tags
Don't use bitnami/minio — it references images that don't exist yet. Use quay.io/minio/minio:latest directly.

2. Always set Kubernetes resource limits on shared VMs
Without limits, one greedy pod (looking at you, Ollama) will OOMKill everything else. Set limits.memory always.

3. RAM planning matters more than you think
On a 32 GB machine, Windows itself consumes ~20 GB. That leaves 12 GB for your VM. Budget carefully — 10 GB for the VM is the realistic sweet spot.

4. GitLab SaaS > self-hosted for a home lab
Self-hosting GitLab CE needs 6+ GB of RAM just to idle. GitLab.com free tier gives you unlimited private repos, 400 CI/CD minutes/month, and a container registry. Use it.

5. Start small with LLMs on CPU
Forget Mistral 7B on CPU without a GPU. llama3.2:1b is surprisingly capable for RAG experiments and uses only 1.3 GB. Add GPU passthrough later if you need more power.

6. Use winget not choco for Multipass on Windows
Chocolatey's Multipass package uses an installer that fails on recent Windows builds. winget install Canonical.Multipass works every time.


What's Next

This setup is a solid foundation. Here's what I'm building on top of it:

  • Kubeflow Pipelines — proper ML pipeline orchestration on K8s
  • OpenTelemetry Collector — unified traces/metrics/logs routing
  • Datadog integration — ship everything to cloud observability
  • Terraform IaC — replace all kubectl apply with proper infrastructure as code
  • RAG pipeline — Qdrant + LangChain + Ollama end-to-end

Quick Reference

# VM management (Windows PowerShell)
multipass list                    # list VMs
multipass shell vm-k3s            # enter VM
multipass suspend vm-k3s          # pause (saves RAM)
multipass start vm-k3s            # resume

# Inside the VM
kubectl get pods -A               # all pods
kubectl top pods -A               # RAM/CPU usage
free -mh                          # available RAM
watch kubectl get pods -A         # live monitoring

Enter fullscreen mode Exit fullscreen mode


Resources


Built this through caffeine and kubectl describe pod debugging. If you hit issues I didn't cover, drop a comment — happy to help.

If this saved you time, leave a ❤️ — it helps others find it.