TL;DR — 吾于 Windows 11 之机,以 Multipass 配 k3s,设 MLOps 之全栈。此乃真指南也——尽载吾所遇之误,及如何解之。无虚饰,无完美之截图。惟实然可行之事耳.
为何构设家之实验室?
习之云费,积速如山。设家之实验室,予尔:
- 真 Kubernetes — 非玩具之 Minikube 模式也
- 全MLOps栈 — MLflow, Minio, Airflow, Ollama, Qdrant
- GitLab之CI/CD — 实际之流程,非教程之演示
- 零成本 — 运于尔已拥有之硬件
- 安全之沙盒 — 毁坏亦无后患
其旨非仅使诸务运行。其旨乃在习练DevOps與MLOps之全流程自始至终:推代码→管道触发→Terraform规整→服务部署→指标显于Grafana.
我之设
| 资源 | 值 |
|---|---|
| 操作系统 | Windows 11 Pro |
| 内存 | 32 GB |
| 处理器 | 八核 |
| 硬盘 | 500 GB SSD |
| 虚拟机监视器 | Hyper-V (原生 Windows 专业版) |
| 虚拟机管理器 | Multipass |
| Kubernetes | k3s |
架构决策:吾恒以 Windows 为日常驱动,悉运行于 Ubuntu 虚拟机之内,藉 Multipass 之力。分野明晰,启停自如,无并驾之烦忧。
堆栈
Windows 11 (daily driver)
│
├── 🌐 GitLab.com (SaaS — free tier)
│ └── Pipelines + Container Registry
│
└── Multipass → vm-k3s (10 GB RAM / 4 CPU / 80 GB)
│
├── ☸️ k3s (Kubernetes)
│
├── ⚙️ MLOps
│ ├── MLflow — experiment tracking
│ ├── Minio — S3-compatible artifact storage
│ └── Airflow — pipeline orchestration
│
├── 🤖 LLM Stack
│ ├── Ollama — run LLMs locally (CPU)
│ └── LiteLLM — unified OpenAI-compatible API
│
├── 🔍 RAG Stack
│ ├── Qdrant — vector database
│ └── LangChain — RAG orchestration
│
├── 📊 Observability
│ ├── Prometheus — metrics
│ ├── Grafana — dashboards
│ └── Loki — centralized logs
│
└── 🔐 HashiCorp Vault — secrets management
第一步 — 启用Hyper-V并安装工具
首,于 Windows Pro 中启 Hyper-V(Multipass 所需也):
# Run as Administrator
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V-All
# Reboot when prompted
重启后,以winget安装诸工具:
winget install Canonical.Multipass
winget install Git.Git
winget install Microsoft.VisualStudioCode
winget install Hashicorp.Terraform
winget install Helm.Helm
winget install Kubernetes.kubectl
何故择winget而弃Chocolatey?吾初用
choco install multipass,遇此谬误:
Exception calling "Start": "The specified executable is not a valid application for this OS platform."
Winget直装认证之安装程序。宜用winget。
第二步 — 置虚拟机
multipass launch `
--name vm-k3s `
--cpus 4 `
--memory 10G `
--disk 80G `
22.04
内存小贴士:吾初试十六吉,得:
Failed to allocate 16384 MB of RAM: Insufficient system resources
Windows已耗用约20GB。10GB于32GB之机,乃至善之境——使操作系统从容,予k3s广袤之域。
审汝之实有之RAM,乃造VM之前。
Get-CimInstance Win32_OperatingSystem | Select-Object FreePhysicalMemory, TotalVisibleMemorySize
入虚拟机:
multipass shell vm-k3s
# Prompt becomes: ubuntu@vm-k3s:~$
第三步 — 安装 Docker + k3s
于虚拟机内:
# Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker ubuntu
sudo systemctl enable docker && sudo systemctl start docker
# k3s — lightweight Kubernetes
curl -sfL https://get.k3s.io | sh -s - \
--write-kubeconfig-mode 644 \
--disable traefik \
--docker
sleep 20
sudo k3s kubectl get nodes
配置 kubectl:
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown ubuntu:ubuntu ~/.kube/config
echo 'export KUBECONFIG=~/.kube/config' >> ~/.bashrc
source ~/.bashrc
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# vm-k3s Ready control-plane,master 30s v1.29.x
第四步 — Helm + 命名空间
# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Add repos
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add prometheus https://prometheus-community.github.io/helm-charts
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
# Create namespaces
for ns in mlops llm rag monitoring logging vault; do
kubectl create namespace $ns
done
第五步 — 连接 GitLab CI
吾用GitLab.com SaaS,非自架 GitLab。此省 RAM 六 GB — GitLab CE 单独需六 GB 以上。免费版于家庭实验室已足。
于 gitlab.com 上立一项目,取注册之令牌。设置 → 持续集成/持续部署 → 运行器,则:
# Install GitLab Runner
curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh | sudo bash
sudo apt-get install -y gitlab-runner
sudo usermod -aG docker gitlab-runner
# Register
sudo gitlab-runner register \
--non-interactive \
--url "https://gitlab.com" \
--registration-token "YOUR_TOKEN_HERE" \
--executor "docker" \
--docker-image "alpine:latest" \
--docker-volumes "/var/run/docker.sock:/var/run/docker.sock" \
--docker-privileged \
--description "homelab-runner" \
--tag-list "homelab,k8s,mlops,terraform" \
--run-untagged true
sudo gitlab-runner start
君之跑者,数秒内显绿于 GitLab。今每推,皆引本地之真 CI/CD。
第六步 — 部署Minio(正道)
此乃吾初遇大阻之境也。
吾所试者何:
helm install minio bitnami/minio \
--namespace mlops \
--set auth.rootUser=minioadmin \
--set auth.rootPassword=minioadmin123
所遇之事何:
Failed to pull image "docker.io/bitnami/minio:2025.7.23-debian-12-r3": not found
Error: ErrImagePull → ImagePullBackOff
Bitnami所生之Helm图签,引 Docker所载之像,然此像尚未存于Docker Hub。此乃时序之故也。
其修也 — 直用官办Minio之像:
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: minio
namespace: mlops
spec:
replicas: 1
selector:
matchLabels:
app: minio
template:
metadata:
labels:
app: minio
spec:
containers:
- name: minio
image: quay.io/minio/minio:latest
command: ["minio", "server", "/data", "--console-address", ":9001"]
env:
- name: MINIO_ROOT_USER
value: "minioadmin"
- name: MINIO_ROOT_PASSWORD
value: "minioadmin123"
ports:
- containerPort: 9000
name: api
- containerPort: 9001
name: console
---
apiVersion: v1
kind: Service
metadata:
name: minio
namespace: mlops
spec:
type: NodePort
ports:
- name: api
port: 9000
nodePort: 30900
- name: console
port: 9001
nodePort: 30901
selector:
app: minio
EOF
quay.io/minio/minio者,Minio自设之登载所也 — 常新无谬,无号之别。
第七步 — 部署MLflow
MLflow需Minio为其实物后援。吾用SQLite以简之(家居实验室不倚PostgreSQL):
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow
namespace: mlops
spec:
replicas: 1
selector:
matchLabels:
app: mlflow
template:
metadata:
labels:
app: mlflow
spec:
initContainers:
- name: create-minio-bucket
image: quay.io/minio/mc:latest
command: ["/bin/sh", "-c"]
args:
- |
mc alias set minio http://minio:9000 minioadmin minioadmin123
mc mb minio/mlflow --ignore-existing
containers:
- name: mlflow
image: ghcr.io/mlflow/mlflow:latest
command:
- mlflow
- server
- --host=0.0.0.0
- --port=5000
- --backend-store-uri=sqlite:///mlflow.db
- --default-artifact-root=s3://mlflow/
- --serve-artifacts
env:
- name: MLFLOW_S3_ENDPOINT_URL
value: "http://minio:9000"
- name: AWS_ACCESS_KEY_ID
value: "minioadmin"
- name: AWS_SECRET_ACCESS_KEY
value: "minioadmin123"
- name: AWS_DEFAULT_REGION
value: "us-east-1"
ports:
- containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: mlflow
namespace: mlops
spec:
type: NodePort
ports:
- port: 5000
nodePort: 30500
selector:
app: mlflow
EOF
initContainer在服务器启动前自动于Minio中创建mlflow桶——无需手动配置.
第八步——在本地以Ollama运行LLM
此乃妙处所在。于本地机器,于Kubernetes之中,唯以CPU之力运行真实LLM.
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: llm
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
env:
- name: OLLAMA_NUM_PARALLEL
value: "1"
- name: OLLAMA_MAX_LOADED_MODELS
value: "1"
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "3"
volumeMounts:
- name: ollama-data
mountPath: /root/.ollama
volumes:
- name: ollama-data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: llm
spec:
type: NodePort
ports:
- port: 11434
nodePort: 31434
selector:
app: ollama
EOF
严重:共享虚拟机务必设定资源限制无限制,Ollama将耗尽所有可用内存,OOM杀其他容器。此教训痛彻心扉
CPU模型之择
| 模型 | 所需内存 | 适用场景 |
|---|---|---|
| Mistral 7B Q4 | 4.3 GB | 重于四吉之限 |
| Phi-3 Mini | 三吉半 | 犹重 |
| llama3.2:1b | 一吉三 | ✅ 适于CPU家宅之研 |
| gemma2:2b | 一吉六 | ✅ 良替 |
OLLAMA_POD=$(kubectl get pod -n llm -l app=ollama -o jsonpath='{.items[0].metadata.name}')
# Pull the model
kubectl exec -n llm $OLLAMA_POD -- ollama pull llama3.2:1b
# Test it
kubectl exec -n llm $OLLAMA_POD -- ollama run llama3.2:1b "Explain RAG in 2 sentences"
以 API 測試:
VM_IP=$(hostname -I | awk '{print $1}')
curl -s http://$VM_IP:31434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2:1b",
"prompt": "What is MLOps?",
"stream": false
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
步第九 — 可觀察性堆疊
# Prometheus + Grafana
helm install kube-prometheus prometheus/kube-prometheus-stack \
--namespace monitoring \
--set grafana.service.type=NodePort \
--set grafana.service.nodePort=30300 \
--set grafana.adminPassword=admin123 \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
# Loki (log aggregation) + Promtail (log shipper)
helm install loki grafana/loki-stack \
--namespace logging \
--set grafana.enabled=false \
--set promtail.enabled=true
訪 Grafana:
VM_IP=$(hostname -I | awk '{print $1}')
echo "Grafana: http://$VM_IP:30300"
# Login: admin / admin123
於 Grafana 中加 Loki 為數據源:
- 設置 → 資料來源 → 添加 → Loki
- 網址:
http://loki.logging:3100
今已統一記錄與指標於一儀表.
步驟十 — 密鑰管理之閣
helm install vault hashicorp/vault \
--namespace vault \
--set server.dev.enabled=true \
--set server.dev.devRootToken=root \
--set ui.enabled=true \
--set ui.serviceType=NodePort \
--set ui.serviceNodePort=30820
# Store your first secret
VAULT_POD=$(kubectl get pod -n vault -l app.kubernetes.io/name=vault -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n vault $VAULT_POD -- vault secrets enable kv-v2
kubectl exec -n vault $VAULT_POD -- vault kv put secret/homelab \
minio_key=minioadmin \
minio_secret=minioadmin123
VM_IP=$(hostname -I | awk '{print $1}')
echo "Vault UI: http://$VM_IP:30820 (token: root)"
於汝之GitLab CI流程中,引閣中密鑰,勿硬碼於變數。
全景图 — 所有服务运行
VM_IP=$(hostname -I | awk '{print $1}')
echo "=== Your Home Lab ==="
echo "MLflow : http://$VM_IP:30500"
echo "Minio : http://$VM_IP:30901"
echo "Grafana : http://$VM_IP:30300 (admin/admin123)"
echo "Vault : http://$VM_IP:30820 (token: root)"
echo "Ollama : http://$VM_IP:31434"
所得之训
1. Bitnami Helm 图表因镜像标签而崩坏
勿用bitnami/minio—此引未成之像。用之。quay.io/minio/minio:latest直也。
二、常设 Kubernetes 资源之限于共用之虚拟机
无限则一贪欲之 Pod(尔其 Ollama 是也)将使众皆 OOMKill。常设 limits.memory 是务也.
三、内存之筹谋重于所思
于三十二吉字节之机,Windows 自身耗约二十吉字节。所余十二吉字节供尔之虚拟机。慎量之——十吉字节于虚拟机,乃切合实际之至善也。
四、GitLab SaaS > 家用实验室自托管
自托管GitLab CE需6GB以上内存方得闲置。GitLab.com免费版予无限私有仓库,400分钟CI/CD/月,及容器注册中心。用之。
五、CPU上LLM宜小始
无GPU勿用Mistral 7B于CPU。llama3.2:1b 之能,出人意料,适于 RAG 之试,仅用 1.3 GB。需力更盛,则后加 GPU 之透。
6. 用 winget 而非 choco 于 Windows 之 Multipass。
Chocolatey 之 Multipass 包,用安装者,于近年 Windows 之构,辄败。winget install Canonical.Multipass 每试必效。
何以继之
Kubeflow Pipelines
- — 于K8s上,善调机器学习之管 OpenTelemetry Collector
- — 统一踪迹、度数、日志之导 Datadog集成
- — 尽送云上之可察 Terraform IaC
-
— 替换所有
kubectl apply为适当的基础设施即代码 - RAG 流程 — Qdrant + LangChain + Ollama 端到端
快速参考
# VM management (Windows PowerShell)
multipass list # list VMs
multipass shell vm-k3s # enter VM
multipass suspend vm-k3s # pause (saves RAM)
multipass start vm-k3s # resume
# Inside the VM
kubectl get pods -A # all pods
kubectl top pods -A # RAM/CPU usage
free -mh # available RAM
watch kubectl get pods -A # live monitoring
资源
此乃以咖啡与kubectl describe pod调试之功所建。若遇未及详述之困,但留言相询,吾乐于助之.
若此助尔省时,留一❤️——此亦助他人得见之。























