惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

Ogenki

Self-hosted LLM stack: a solid foundation for an open-weight platform built to evolve A few months with `Claude Code`: tips and workflows that helped me `Agentic Coding`: concepts and hands-on Platform Engineering use cases `PostgreSQL`: From Metrics to Query Plan Analysis `VictoriaLogs`: What if logs management became simple and performant? `VictoriaMetrics` : Effective alerts, from theory to practice 🛠️ Harness the Power of `VictoriaMetrics` and `Grafana` Operators for Metrics Management `Dagger`: The missing piece of the developer experience? `TLS` with Gateway API: Efficient and Secure Management of Public and Private Certificates Going Further with `Crossplane`: Compositions and Functions Beyond Traditional VPNs: Simplifying Cloud Access with `Tailscale` `Gateway API`: Can I replace my Ingress Controller with `Cilium`? Applying GitOps Principles to Infrastructure: An overview of `tf-controller` `CloudNativePG`: An easy way to run PostgreSQL on Kubernetes 100% `GitOps` using Flux My Kubernetes cluster (GKE) with `Crossplane` Manage tools versions with `asdf` Helm workshop: Templating exercises Helm workshop: Build your first chart Helm workshop: Lifecycle operations Helm workshop: Ecosystem Helm workshop: Third party charts Helm workshop Kubernetes workshop: Manage permissions in Kubernetes Kubernetes workshop: Resources allocation and autoscaling Kubernetes workshop: Complete application stack Kubernetes workshop: Local environment Run an application on Kubernetes Kubernetes workshop
Kubernetes workshop: Troubleshooting
2021-05-07 · via Ogenki

Events

The first source of information when something goes wrong is the event stream. Note that you may want to sort them by creation time

1kubectl get events -n foo --sort-by=.metadata.creationTimestamp
2...
320m         Normal    Created             pod/web-85575f4476-5pbqv    Created container nginx
420m         Normal    Started             pod/web-85575f4476-5pbqv    Started container nginx
520m         Normal    SuccessfulDelete    replicaset/web-987f6cf9     Deleted pod: web-987f6cf9-mzsxd
620m         Normal    ScalingReplicaSet   deployment/web              Scaled down replica set web-987f6cf9 to 0

Logs

Having a look to a pod's logs is just the matter of running

1kubectl logs -f --tail=7 -c mysql wordpress-mysql-6c597b98bd-4mbbd
22021-06-24 08:27:38 1 [Note]   - '::' resolves to '::';
32021-06-24 08:27:38 1 [Note] Server socket created on IP: '::'.
42021-06-24 08:27:38 1 [Warning] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
52021-06-24 08:27:38 1 [Warning] 'proxies_priv' entry '@ root@wordpress-mysql-6c597b98bd-4mbbd' ignored in --skip-name-resolve mode.
62021-06-24 08:27:38 1 [Note] Event Scheduler: Loaded 0 events
72021-06-24 08:27:38 1 [Note] mysqld: ready for connections.
8Version: '5.6.51'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)

Alternatively you can use a tool made to display logs from multiple pods: stern. A better way to explore logs is to send them to a central location using a tool such as Loki or the well know EFK stack.

Health checks

Kubernetes self healing system is mostly based on health checks. There are different types of health checks (please have a look to the official documentation).

We'll add a new plugin to kubectl which is really useful to export a resource while cleaning useless metadatas: neat

1kubectl krew install neat
2Updated the local copy of plugin index.
3Installing plugin: neat
4Installed plugin: neat
5...

Let's create a new deployment using the image nginx

1kubectl create deploy web --image=nginx --dry-run=client -o yaml | kubectl neat > /tmp/web.yaml

Edit its content and add an HTTP health check on port 80. The endpoint must return a code ranging between 200 and 400 and it has to be a relevant test that shows the actual availability of the service.

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  labels:
 5    app: web
 6  name: web
 7spec:
 8  replicas: 1
 9  selector:
10    matchLabels:
11      app: web
12  template:
13    metadata:
14      creationTimestamp: null
15      labels:
16        app: web
17    spec:
18      containers:
19      - image: nginx
20        name: nginx
21        livenessProbe:
22          httpGet:
23            path: /
24            port: 80
25          initialDelaySeconds: 3
26          periodSeconds: 3
1kubectl apply -f /tmp/web.yaml
2deployment.apps/web created
3
4kubectl describe deploy web | grep Liveness:
5    Liveness:     http-get http://:80/ delay=3s timeout=1s period=3s #success=1 #failure=3

The pod should be up without any error

1kubectl get po -l app=web
2NAME                   READY   STATUS    RESTARTS   AGE
3web-85575f4476-6qvd5   1/1     Running   0          92s

We're going to simulate a service being unavailable, just change the path being checked. Here we'll use another method to modify a resource by creating a patch and applying it.

Create a yaml /tmp/patch.yaml file

 1cat > /tmp/patch.yaml <<EOF
 2spec:
 3  template:
 4    spec:
 5      containers:
 6      - name: nginx
 7        livenessProbe:
 8          httpGet:
 9            path: /foobar
10EOF

And we're going to apply our change as follows

1kubectl patch deployment web --patch "$(cat /tmp/patch.yaml)" --record
2deployment.apps/web patched
3
4kubectl describe deployment web | grep Liveness:
5    Liveness:     http-get http://:80/foobar delay=3s timeout=1s period=3s #success=1 #failure=3

Now our pod should start to fail, the number of restarts increases

1kubectl get po -l app=web
2web-987f6cf9-n4rnb                 1/1     Running   4          83s

Until the pod enter in a CrashLoopBackOff, meaning that it constantly restarts.

1kubectl get po -l app=web
2NAME                 READY   STATUS             RESTARTS   AGE
3web-987f6cf9-n4rnb   0/1     CrashLoopBackOff   5          3m23s

Describing the pod will give you a hint on the reason it restarts

1kubectl describe po web-987f6cf9-n4rnb | tail -n 5
2Normal   Created    4m7s (x3 over 4m30s)   kubelet            Created container nginx
3Normal   Started    4m7s (x3 over 4m30s)   kubelet            Started container nginx
4Warning  Unhealthy  3m56s (x9 over 4m26s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 404
5Normal   Killing    3m56s (x3 over 4m20s)  kubelet            Container nginx failed liveness probe, will be restarted
6Normal   Pulling    3m56s (x4 over 4m35s)  kubelet            Pulling image "nginx"

Rollback the latest change in order to return to a working state. Note that we used the option --record when we applied the patch. That helps saving changes history.

 1kubectl rollout history deployment web
 2deployment.apps/web
 3REVISION  CHANGE-CAUSE
 41         <none>
 52         kubectl patch deployment web --patch=spec:
 6  template:
 7    spec:
 8      containers:
 9      - name: nginx
10        livenessProbe:
11          httpGet:
12            path: /foobar --record=true
13
14kubectl rollout undo deployment web
15deployment.apps/web rolled back

Cleanup

1kubectl delete deploy web
2deployment.apps "web" deleted

learnk8s documentation

There is a great documentation that contains all the steps that help debugging a deployment: https://learnk8s.io/troubleshooting-deployments

➡️ Next: RBAC