惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Security Latest
Security Latest
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Stack Overflow Blog
Stack Overflow Blog
WordPress大学
WordPress大学
N
Netflix TechBlog - Medium
GbyAI
GbyAI
云风的 BLOG
云风的 BLOG
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
博客园 - 【当耐特】
C
Cyber Attacks, Cyber Crime and Cyber Security
雷峰网
雷峰网
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Threat Research - Cisco Blogs
NISL@THU
NISL@THU
Spread Privacy
Spread Privacy
P
Proofpoint News Feed
J
Java Code Geeks
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
MyScale Blog
MyScale Blog
T
Tor Project blog
P
Proofpoint News Feed
C
CERT Recently Published Vulnerability Notes
P
Privacy & Cybersecurity Law Blog
MongoDB | Blog
MongoDB | Blog
Simon Willison's Weblog
Simon Willison's Weblog
C
Cybersecurity and Infrastructure Security Agency CISA
L
LINUX DO - 热门话题
小众软件
小众软件
G
GRAHAM CLULEY
P
Privacy International News Feed
AWS News Blog
AWS News Blog
Know Your Adversary
Know Your Adversary
P
Palo Alto Networks Blog
人人都是产品经理
人人都是产品经理
S
Schneier on Security
Scott Helme
Scott Helme
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
B
Blog RSS Feed
T
The Exploit Database - CXSecurity.com
Recent Announcements
Recent Announcements
E
Exploit-DB.com RSS Feed
C
CXSECURITY Database RSS Feed - CXSecurity.com
U
Unit 42
The Register - Security
The Register - Security
S
Securelist
Martin Fowler
Martin Fowler
Project Zero
Project Zero
大猫的无限游戏
大猫的无限游戏
Cisco Talos Blog
Cisco Talos Blog

博客园 - 尹正杰

kubespray实战案例 kubespray管理k8s的worker集群扩缩容 kubespray快速部署k8s集群实战 Kubeasz基于ezctl实现etcd集群的管理实战 Kubeasz基于ezctl实现k8s集群一键升级 4 Calico 底层原理及IPIP(依赖BGP协议))和vxlan(不依赖BGP)工作模式切换 kubeasz基于ezctl实现k8s集群的扩容和缩容 kubeasz快速部署K8S集群实战 sts部署kafka sts部署MySQL主从同步 windows极速部署Openclaw实战篇 K8S的StatefulSet控制器应用案例之MySQL主从同步实战 k8s底层基于不同运行时集成harbor企业级私有仓库实战 二进制K8S集群附加组件部署及CNI网络插件切换实战 二进制部署K8S 1.35.0+最新版实战案例 etcd高可用集群部署及K8S周期性备份数据实战 基于Docker实现《若依》服务业务容器化实战篇 k8s集群基于Flannel网络插件部署凡人修仙传 k8s集群基于Calico网络插件部署凡人修仙传 ElasticSEearch 9.X环境部署 K8S Vertical Pod Autoscaler(VPA)实战案例 Prometheus监控自定义程序指标
Kubeasz使用吐槽博客专题
尹正杰 · 2026-06-03 · via 博客园 - 尹正杰

                                              作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

1.部署k8s完成后莫名其妙多了一块网卡,GBP路由连接异常!

1 故障现象描述

  1.查看各节点的BGP路由
[root@ansible99 ~]# dk ansible -i /etc/kubeasz/clusters/yinzhengjie-k8s/hosts all -m shell -a 'calicoctl node status'
10.0.0.233 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.231   | node specific | up    | 09:52:22 | Established |
| 10.0.0.232   | node specific | up    | 09:52:22 | Established |
| 10.0.0.77    | node specific | up    | 09:52:22 | Established |
| 172.20.0.1   | node specific | start | 09:52:20 | Connect     |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.0.0.231 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.232   | node specific | up    | 09:52:21 | Established |
| 10.0.0.233   | node specific | up    | 09:52:21 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.0.0.232 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.231   | node specific | up    | 09:52:22 | Established |
| 10.0.0.233   | node specific | up    | 09:52:22 | Established |
| 10.0.0.77    | node specific | up    | 09:52:22 | Established |
| 172.20.0.1   | node specific | start | 09:52:20 | Connect     |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.0.0.66 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+---------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |  INFO   |
+--------------+---------------+-------+----------+---------+
| 10.0.0.232   | node specific | start | 09:52:19 | Connect |
| 10.0.0.233   | node specific | start | 09:52:19 | Connect |
+--------------+---------------+-------+----------+---------+

IPv6 BGP status
No IPv6 peers found.
10.0.0.77 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.232   | node specific | up    | 09:52:22 | Established |
| 10.0.0.233   | node specific | up    | 09:52:22 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
[root@ansible99 ~]#  

  2.故障描述
发现了worker232和worker233节点存在IP地址为'172.20.0.1'的'mynet0'网卡。

2 分析原因

  1.查看本地的路由信息
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.231   | node specific | up    | 09:52:22 | Established |
| 10.0.0.233   | node specific | up    | 09:52:22 | Established |
| 10.0.0.77    | node specific | up    | 09:52:22 | Established |
| 172.20.0.1   | node specific | start | 09:52:20 | Connect     |
+--------------+---------------+-------+----------+-------------+

  2.分析
其中172.20.0.1的地址处于TCP 握手阶段,卡在 Connect 无法进入 Established,属于异常邻居。

而这个网卡是我在部署集群时,自动给我创了了一个虚拟网卡导致的,分析如下所示:
[root@worker66 ~]# ip a
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:fb:13:4a brd ff:ff:ff:ff:ff:ff
    altname enp2s1
    altname ens33
    inet 10.0.0.66/24 brd 10.0.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fefb:134a/64 scope link 
       valid_lft forever preferred_lft forever
...
4: mynet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a2:43:73:de:11:4b brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.1/16 brd 172.20.255.255 scope global mynet0
       valid_lft forever preferred_lft forever
    inet6 fe80::a043:73ff:fede:114b/64 scope link 
       valid_lft forever preferred_lft forever
...
[root@worker66 ~]#

3 解决方案

  1.删除网桥设备
[root@worker66 ~]# ifconfig mynet0
mynet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.20.0.1  netmask 255.255.0.0  broadcast 172.20.255.255
        inet6 fe80::a043:73ff:fede:114b  prefixlen 64  scopeid 0x20<link>
        ether a2:43:73:de:11:4b  txqueuelen 1000  (Ethernet)
        RX packets 12010  bytes 1401177 (1.4 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12231  bytes 9498334 (9.4 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@worker66 ~]# 
[root@worker66 ~]# ip link set dev mynet0 down  # 先关闭网卡
[root@worker66 ~]# 
[root@worker66 ~]# ip link delete mynet0  # 删除网桥设备
[root@worker66 ~]# 
[root@worker66 ~]# ifconfig mynet0
mynet0: error fetching interface information: Device not found
[root@worker66 ~]# 


  2.再次查看路由信息【恢复正常】
[root@ansible99 ~]# dk ansible -i /etc/kubeasz/clusters/yinzhengjie-k8s/hosts all -m shell -a 'calicoctl node status'
10.0.0.232 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.231   | node specific | up    | 09:52:22 | Established |
| 10.0.0.233   | node specific | up    | 09:52:22 | Established |
| 10.0.0.77    | node specific | up    | 09:52:22 | Established |
| 10.0.0.66    | node specific | up    | 11:24:09 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.0.0.231 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.232   | node specific | up    | 09:52:21 | Established |
| 10.0.0.233   | node specific | up    | 09:52:21 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.0.0.77 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.232   | node specific | up    | 09:52:22 | Established |
| 10.0.0.233   | node specific | up    | 09:52:22 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.0.0.233 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.231   | node specific | up    | 09:52:22 | Established |
| 10.0.0.232   | node specific | up    | 09:52:22 | Established |
| 10.0.0.77    | node specific | up    | 09:52:22 | Established |
| 10.0.0.66    | node specific | up    | 11:24:09 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
10.0.0.66 | CHANGED | rc=0 >>
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+----------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |  SINCE   |    INFO     |
+--------------+---------------+-------+----------+-------------+
| 10.0.0.232   | node specific | up    | 11:24:09 | Established |
| 10.0.0.233   | node specific | up    | 11:24:09 | Established |
+--------------+---------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
[root@ansible99 ~]#  

2.HTTP request sent, awaiting response... 403 Forbidden

1.故障描述

image

[root@ansible99 ~]# ./ezdown -D
2026-06-01 01:19:53 [ezdown:717] INFO Action begin: download_all
2026-06-01 01:19:53 [ezdown:162] INFO downloading docker binaries, arch:x86_64, version:20.10.24
--2026-06-01 01:19:53--  https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/static/stable/x86_64/docker-20.10.24.tgz
Resolving mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)... 101.6.15.130, 2402:f000:1:400::2
Connecting to mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)|101.6.15.130|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2026-06-01 01:19:53 ERROR 403: Forbidden.

2026-06-01 01:19:53 [ezdown:164] ERROR downloading docker failed
[root@ansible99 ~]# 

2.分析原因

下载docker-20.10.24.tgz软件包失败,

3.解决方案

[root@ansible99 ~]# wget https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/static/stable/x86_64/docker-20.10.24.tgz -O /etc/kubeasz/down/docker-20.10.24.tgz

3.Could not find or access 'calico-v3.31.yaml.j2'\nSearched in ...

1.报错信息

image

TASK [calico : 配置 calico DaemonSet yaml文件] **************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
fatal: [10.0.0.231]: FAILED! => {"changed": false, "msg": "Could not find or access 'calico-v3.31.yaml.j2'\nSearched in:\n\t/etc/kubeasz/roles/calico/templates/calico-v3.31.yaml.j2\n\t/etc/kubeasz/roles/calico/calico-v3.31.yaml.j2\n\t/etc/kubeasz/roles/calico/tasks/templates/calico-v3.31.yaml.j2\n\t/etc/kubeasz/roles/calico/tasks/calico-v3.31.yaml.j2\n\t/etc/kubeasz/playbooks/templates/calico-v3.31.yaml.j2\n\t/etc/kubeasz/playbooks/calico-v3.31.yaml.j2 on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}

NO MORE HOSTS LEFT ******************************************************************************************************************************

2.故障分享

根据报错,应该是缺少'calico-v3.31.yaml.j2'对应的模板。

3.解决方案

    1.下载模板
[root@ansible99 ~]# cd /etc/kubeasz/roles/calico/templates
[root@ansible99 templates]# 
[root@ansible99 templates]# 
[root@ansible99 templates]#  wget https://raw.githubusercontent.com/projectcalico/calico/v3.31.5/manifests/calico.yaml -O calico-v3.31.yaml.j2


    2.重新安装网络。
[root@ansible99 ~]# dk ezctl setup yinzhengjie-k8s  06


4.https://registry-1.docker.io/v2/": dial tcp 128.242.245.93:443: i/o timeout

1.故障描述

image

...
3.10: Pulling from easzlab/pause
61d9e957431b: Pulling fs layer 
Get "https://registry-1.docker.io/v2/": dial tcp 128.242.245.93:443: i/o timeout
2026-06-02 12:33:57 [ezdown:556] ERROR download easzlab/pause:3.10 failed!
2026-06-02 12:33:57 [ezdown:718] ERROR Action failed: download_all
[root123@vm1 ~]# 

2.故障分析

这是因为拉取镜像超时了,可能是网络波动或者网络不可达,需要多重试几次。

3.解决方案

 解决方案:
    - 1.可以多尝试几次;
	- 2.如果拉取不了镜像,需要配置docker的VPN代理,效果如下:
[root@k8s-cluster241 ~]# systemctl cat docker
...
[Service]
...
# 主要添加如下3行代码,仅需要修改你自己的VPN地址即可.
Environment="HTTP_PROXY=http://10.0.0.1:7890"
Environment="HTTPS_PROXY=http://10.0.0.1:7890"
Environment="NO_PROXY=localhost,127.0.0.1,easzlab.io.local,.docker.internal,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
ExecStart=/opt/kube/bin/dockerd
...
[root@k8s-cluster241 ~]# 

5.关于Kubeasz的生产使用建议

1.学习环境可以使用,生产环境不建议直接使用

image

如上图所示,80个k8s从业人员,没有搞定滚动更新k8s集群的事情,让我不得不吐槽下。

表扬:
  先表扬下Kubeasz的功能,的确有集群管理能力,比如k8s集群部署,扩容,缩容,升级,备份,恢复都搞定测试了,的确都支持。

不足:
   在扩容,缩容k8s集群节点(包括但不限于master,slave,etcd)过程中,k8s集群不可用,根本就没有做到滚动更新。而是批量停止服务!!!!
   当然,我测试的版本是k8s1.33.11,但是官网更新了k8s 1.33.12,Kubeasz截止今天依旧没有更新到最新版本的功能。

点评:
   Kubeasz的确是一个不错的点子,思想也不错,就是在具体功能上的逻辑能力还要在自信斟酌优化下。

生产环境建议:
   不推荐使用!学习环境还是一个不错的工具,如果生产环境你已经用上了,建议升级,备份,恢复,扩容和缩容等功能,要重新去修改playbook的相关功能。

2.未来k8s管理使用计划

image

我生产环境2000+的GPU类型的k8s节点,暂时不考虑使用Kubeasz来管理了,近期打算调研下由k8s社区开源的一个kubespray工具。

Kubespray(原 Kargo):基于 Ansible、CNCF 维护的 K8s 一键部署工具,原生生产级 HA 集群,裸机 / 虚拟化 / 公有云全适配,主流企业自建集群首选方案。

参考链接:
  https://github.com/kubernetes-sigs/kubespray