惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
L
LINUX DO - 最新话题
Help Net Security
Help Net Security
The Last Watchdog
The Last Watchdog
Attack and Defense Labs
Attack and Defense Labs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
PCI Perspectives
PCI Perspectives
NISL@THU
NISL@THU
L
LINUX DO - 热门话题
K
Kaspersky official blog
P
Privacy International News Feed
Cloudbric
Cloudbric
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
CERT Recently Published Vulnerability Notes
A
Arctic Wolf
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
The GitHub Blog
The GitHub Blog
Blog — PlanetScale
Blog — PlanetScale
Security Archives - TechRepublic
Security Archives - TechRepublic
博客园 - Franky
博客园_首页
S
SegmentFault 最新的问题
小众软件
小众软件
G
Google Developers Blog
B
Blog
Last Week in AI
Last Week in AI
人人都是产品经理
人人都是产品经理
Project Zero
Project Zero
I
Intezer
L
Lohrmann on Cybersecurity
T
Threat Research - Cisco Blogs
V2EX - 技术
V2EX - 技术
Schneier on Security
Schneier on Security
Forbes - Security
Forbes - Security
T
Tenable Blog
T
The Blog of Author Tim Ferriss
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
量子位
The Hacker News
The Hacker News
C
Cisco Blogs
G
GRAHAM CLULEY
AWS News Blog
AWS News Blog
P
Privacy & Cybersecurity Law Blog
T
Troy Hunt's Blog
Hacker News: Ask HN
Hacker News: Ask HN
Recorded Future
Recorded Future
MyScale Blog
MyScale Blog
V
Visual Studio Blog
爱范儿
爱范儿

博客园_首页

Plist 二进制格式 Milvus 和 PGVector,哪个更好? OpenClaw 已过时?在 VS Code 中运行 Hermes Agent! 第30篇文章:一个大三计科生的自白 Manim如何在数学公式中完美显示中文? Docker 部署 RocketMQ 5 并发编程核心概念辨析 C#事务处理最佳实践:别再让“主表存了、明细丢了”的破事发生 CLI 是什么?为什么大厂突然集体卷命令行? 【从0到1构建一个ClaudeAgent】协作-自主Agent UIImageView 设置图片不生效的原因排查 最小二乘问题详解20:无先验约束下的增量式SFM自由网平差 痞子衡嵌入式:大话双核i.MXRT1180之XIP应用里借助MU实现可靠Flash IAP的方法 AI Chat 封装, SemanticKerne.AiProvider.Unified 已发布 Windows下右键编辑js文件无法打开记事本——在注册表中使用环境变量 在后台服务中使用 Scoped 服务,为什么总是报错? H200 安装驱动并使用sglang启动模型 wireshark 抓包Trap上报告警内容 我用 AI 辅助开发了一系列小工具(2):图片压缩工具 [A Primer On MC and CC] 2.1 Memory Consistency 1 - 指令重排序和 SC 模型 Oracle数据库SCN推进技术详解与实践指南 玩转控件:封装个带图片的Label控件 Claude Code 4.7 真正该升级的不是模型,而是你的工作流 前端小白一句话,AI 帮我做了个颜值拉满的桌面媒体播放器。当代码不再是门槛,一句话编程就是现实。 5. WorkBuddy: 小龙虾的灵魂三件套,让你的小龙虾不只是工具 SQLite 分片方案实战:三种分片策略的深度对比 告别简陋 UI!一款基于 Fluent Design 和基于 WinUI 的开源免费、现代化的 Avalonia UI 控件库 关于二进制排列组合枚举的总结 AI开发-python-LangGraph框架(3-27-LangGraph从零实现大模型智能决策工作流) ElasticSearch主分片和副本分片概念详解 【002】HTTPS 粗解:证书、TLS 握手与对后端配置的影响 Hermes Agent 一周暴涨五万 Star,但我劝你别急着追 明明连接的是Redis的DB0,为什么能查到DB3的数据? 【从0到1构建一个ClaudeAgent】协作-Agent团队 熟悉电子元器件之后,电子小白下一步该怎么走? MAF快速入门(23)通过C#类定义Skills .NET 高级开发 | 手写一个对象映射框架 FastAPI数据库ORM怎么选?我肝了三个Demo后,终于不再纠结了 mysqldump 参数拾遗:在遗忘与铭记之间 C# .NET 周刊|2026年3月5期 Claude code入门 - 陈彦斌 一文学习入门 ThingsBoard 开源物联网平台 GitHub 热门项目 | 2026年04月16日 如何为GIT设置全局勾子,为每次提交追加信息 Number.isFinite和isFinite与isNaN()和Number.isNaN的区别 PortSwigger SQL注入LAB2 推荐一个测试人必备的Skills,从功能到性能全搞定(附详细实操和安装下载方式) 筑基期:掌握Odoo基础核心知识点02(Odoo XML 开发方式详解) GLM模型这么火,咱们用vllm也咧一个呗! 深入理解 AbortController:从底层原理到跨语言设计哲学 字符串学习笔记 多租户系统框架的基础模块设计和分析设计 Apache SeaTunnel Zeta 为什么能做到“又快又稳”? AI开发-python-LangGraph框架(3-26-LangGraph基本概念及第一个简单样例) Vue 3 组件通信,别只会用 Props 和 Emits 了,这几个狠活儿你得看看 ElasticSearch7.X版本配置密码 用Manim实现动态交点计算--从一个动点问题说起 团结引擎+Addressable+Instant Game打包抖音小游戏 function call 实战:让 LLM 自动判断 pod 异常、调用日志工具并完成故障分析 bubseek —— 让 Agent 的足迹,变成团队的洞察 通过 C# 读取并导出 PDF 书签 如何用 GitHub Actions 实现 Steam 自动化发布 【从0到1构建一个ClaudeAgent】并发-后台任务 .NET 高级开发 | 定制 ASP.NET Core 框架 电子小白:什么是运算放大器(运放) zero2Agent:面向大厂面试的 Agent 工程教程,从概念到生产的完整学习路线 堆上的ORW HC32F460 USB CDC通信异常:非对齐访问异常排查 20260413-Hyperbridge 攻击事件:发生在默克尔山上的验证绕过 那些喊着AI 要淘汰你的人,正在靠你的焦虑赚大钱! 深度学习进阶(八)Swin Transformer 最小二乘问题详解19:带先验约束的增量式SFM优化与实现 SnapTranslate 3.0 正式发布:全局划词翻译 + 完整英语学习闭环,一站式搞定查词、记词、复习 工作的意义、工作的困难认知再思考 .NET + AI 进阶实战:基于类的技能开发 - 打造可治理的 Agent 能力模块 【从0到1构建一个ClaudeAgent】规划与协调-技能 上周热点回顾(4.6-4.12) 电子小白的工具三件套:面包板、杜邦线、万能板 单表五亿数据的查询优化 | Mysql、StarRocks 2. WorkBuddy:从“我是谁”到“帮我干活” C# 如何减少代码运行时间:7 个实战技巧 基于HelixToolkit.SharpDX 渲染3D模型 - 笺上知微 从零开始的双臂具身VLA起源及现阶段发展综述 - SkyXZ 记对 xonsh shell 的使用, 脚本编写, 迁移及调优 - pluvium27 受够了Vibe Coding的失控?换个起点,让AI事半功倍 从开始配置漏洞环境到漏洞复现流程 - 難しい 关于10年工作经验的程序员对OpenClaw的实战经验分享以及看法 - 虚无境 Any metadata 的内存布局 C# .NET 周刊|2026年3月2期 - InCerry 我帮你测过了,测试圈排名第二的 Skill 依然很牛逼 Skill Discovery | 无监督技能发现的经典工作总结 - MoonOut 上下文工程是什么?过时了么?一文讲明白! - 一枫说码 开了 TUN 模式还是直连?90% 的人都踩过这个坑 AScript扩展多种脚本语言 - rockey627 AI 学习笔记:Agent 的记忆机制 你能被装进一个文件里吗?——7 万人把同事"蒸馏"成了 AI - 我没有三颗心脏 Claude Code 通关手册(七):给 AI 装上技能包——Skills 完全指南 - 暮色之狐 在浏览器中快速编辑代码:VSCode Web 集成实践 - Newbe36524 蒸馏自己 skill?基于 Deepseek 的蒸馏器,丐版蒸馏方式,简单便捷 - To_Carpe_Diem Spring AI Aliababa和AgentScope,哪个更好? - 苏三说技术
Cilium Native HostGateway 模式使用说明
怎么还在写代码 · 2026-05-05 · via 博客园_首页

模式介绍

项目文档:https://docs.cilium.io/en/stable/network/concepts/routing/#native-routing

Native Routing 原生路由可以简单理解为 Host Gateway 模式,在 Cilium 中与 Host-Routing 是两种东西。

在 Native 原生路由模式下,Cilium 会把所发给非本机 Pod 的数据包,交给 Linux 内核路由处理。也就是说,这个包会被当成本机发出的包来转发。正因如此,集群节点之间的底层网络必须具备路由 Pod IP 网段的能力。所以使用 native 模式,网络必须具备以下要求:

  • Cilium 节点之间的网络必须能转发 Pod IP;
  • Cilium 节点的 Linux 内核必须知道怎么把包转给其他节点的 Pod。这可以通过两种方式实现:
    1. 节点本身不知道如何路由 Pod IP ,但路由器知道。这种情况下,把流量都交给路由器即可。参见 Google CloudAWS ENIAzure IPAM
    2. 所有节点都学习到 Pod IP 并将相应的路由信息添加到内核路由表中:
      • 如果所有节点处于同一个 L2 网络,则可以通过启用 auto-direct-node-routes: true 实现;
      • 否则,需要运行额外的系统组件(例如 BGP 守护进程)来分发路由:
        • kube-router 与 BIDR 方式在 1.16 版本后已弃用
        • BGP Control Plane 方式在新版本转正,它是 Cilium 内置方式,无需安装第三方程序。

image

image

简单来说,虽然 Cilium 没有替代集群 kube-proxy,但 Pod 之间的 Cluster IP 访问,还是被 Cilium 在 tc 层处理了(Ingress 入口从下往上/Egress 出口从上往下)。

图片出处:

部署流程

通过 Kind 快速生成集群并部署 Cilium Native 模式

Cilium Helm Chart 中 ipam 不同值的作用参考官网文档

#!/bin/bash
set -v

# 1. Prepare NoCNI kubernetes environment
cat <<EOF | HTTP_PROXY= HTTPS_PROXY= http_proxy= https_proxy= kind create cluster --name=cilium-kubeproxy --image=kindest/node:v1.27.3 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true
nodes:
  - role: control-plane
  - role: worker
EOF

# 2. Remove kubernetes node taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/control-plane:NoSchedule-

# 3. Install CNI[Cilium 1.17.15]
cilium_version=v1.17.15
docker pull quay.io/cilium/cilium:$cilium_version && docker pull quay.io/cilium/operator-generic:$cilium_version
kind load docker-image quay.io/cilium/cilium:$cilium_version quay.io/cilium/operator-generic:$cilium_version --name cilium-kubeproxy
{ helm repo add cilium https://helm.cilium.io ; helm repo update; } > /dev/null 2>&1

# Direct Routing Options(--set routingMode=native --set autoDirectNodeRoutes=true --set ipv4NativeRoutingCIDR="10.0.0.0/8")
# ipam 不同值的作用参考官网文档:
# https://docs.cilium.io/en/stable/network/concepts/ipam/
# ipv4NativeRoutingCIDR 对应的是 k8s 集群部署时 --pod-network-cidr 的值
helm install cilium cilium/cilium \
  --set k8sServiceHost=$controller_node_ip \
  --set k8sServicePort=6443 \
  --version 1.17.15 \
  --namespace kube-system \
  --set image.pullPolicy=IfNotPresent \
  --set debug.enabled=true \
  --set debug.verbose="datapath flow kvstore envoy policy" \
  --set bpf.monitorAggregation=none \
  --set monitor.enabled=true \
  --set ipam.mode=kubernetes \
  --set cluster.name=cilium-kubeproxy \
  --set routingMode=native \
  --set kubeProxyReplacement=false \
  --set autoDirectNodeRoutes=true \
  --set ipv4NativeRoutingCIDR="10.0.0.0/8"

# 4. Separate namesapce and cgroup v2 verify [https://github.com/cilium/cilium/pull/16259 && https://docs.cilium.io/en/stable/installation/kind/#install-cilium]
#for container in $(docker ps -a --format "table {{.Names}}" | grep cilium-kubeproxy);do docker exec $container ls -al /proc/self/ns/cgroup;done
#mount -l | grep cgroup && docker info | grep "Cgroup Version" | awk '$1=$1'

创建测试 Pod

本质上是 Nginx,用于后续抓包请求测试、iptables 规则查询

#!/bin/bash

controller_node_name=`kubectl get nodes -o wide | grep control-plane | awk -F " " '{print $1}'`
worker_node_name=`kubectl get nodes -o wide | awk -F " " '{print $1}' | grep 'worker$'`

# client pod and service
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: client
  name: client
spec:
  containers:
  - name: client
    image: burlyluo/nettool:9494
    imagePullPolicy: Always
  restartPolicy: Always
  nodeName: ${controller_node_name}

EOF

cat <<EOF | kubectl apply -f - 
apiVersion: v1
kind: Service
metadata:
  labels:
    run: client
  name: clientsvc
spec:
  type: NodePort
  clusterIP: 10.96.94.94
  ports:
  - port: 9494
    protocol: TCP
    targetPort: 9494
    nodePort: 30494
  selector:
    run: client
EOF

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: server
  name: server
spec:
  containers:
  - name: server
    image: burlyluo/nettool:9494
    imagePullPolicy: Always
  restartPolicy: Always
  nodeName: ${worker_node_name}

EOF

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    run: server
  name: serversvc
spec:
  type: NodePort
  clusterIP: 10.96.94.95
  ports:
  - port: 9494
    protocol: TCP
    targetPort: 9494
    nodePort: 30494
  selector:
    run: server
EOF

查看部署结果

root@network-demo:~# kubectl get pods -A -o wide
NAMESPACE            NAME                                       READY   STATUS    RESTARTS   AGE     IP             NODE
default              client                                     1/1     Running   0          3m20s   10.244.0.49    cilium-kubeproxy-control-plane
default              server                                     1/1     Running   0          3m20s   10.244.1.124   cilium-kubeproxy-worker
kube-system          cilium-2bkqk                               2/2     Running   0          9m29s   172.18.0.2     cilium-kubeproxy-worker
kube-system          cilium-envoy-8qz9j                         1/1     Running   0          9m29s   172.18.0.3     cilium-kubeproxy-control-plane
kube-system          cilium-envoy-vb8n4                         1/1     Running   0          9m29s   172.18.0.2     cilium-kubeproxy-worker
kube-system          cilium-operator-86bc7ff44-627mg            1/1     Running   0          9m29s   172.18.0.2     cilium-kubeproxy-worker
kube-system          cilium-operator-86bc7ff44-dg5c6            1/1     Running   0          9m29s   172.18.0.3     cilium-kubeproxy-control-plane
kube-system          cilium-tjsld                               2/2     Running   0          9m29s   172.18.0.3     cilium-kubeproxy-control-plane
kube-system          coredns-5d78c9869d-28zfr                   1/1     Running   0          12m     10.244.0.165   cilium-kubeproxy-control-plane
kube-system          coredns-5d78c9869d-zpsfh                   1/1     Running   0          12m     10.244.0.224   cilium-kubeproxy-control-plane
kube-system          etcd-cilium-kubeproxy-control-plane        1/1     Running   0          12m     172.18.0.3     cilium-kubeproxy-control-plane
kube-system          kube-apiserver-cilium-kubeproxy            1/1     Running   0          12m     172.18.0.3     cilium-kubeproxy-control-plane
kube-system          kube-controller-manager-cilium-kubeproxy   1/1     Running   0          12m     172.18.0.3     cilium-kubeproxy-control-plane
kube-system          kube-proxy-5fqdk                           1/1     Running   0          12m     172.18.0.3     cilium-kubeproxy-control-plane
kube-system          kube-proxy-d26xm                           1/1     Running   0          11m     172.18.0.2     cilium-kubeproxy-worker
kube-system          kube-scheduler-cilium-kubeproxy            1/1     Running   0          12m     172.18.0.3     cilium-kubeproxy-control-plane

验证效果

查看 Cilium 详细信息

1.查询 Cilium 运行状态

root@network-demo:~# kubectl exec -it -n kube-system cilium-tjsld -- cilium status
KVStore:                 Disabled   
Kubernetes:              Ok         1.27 (v1.27.3) [linux/amd64]
Kubernetes APIs:         ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]

## Cilium 未替代 kube-proxy
KubeProxyReplacement:    False   
Host firewall:           Disabled
SRv6:                    Disabled
CNI Chaining:            none
CNI Config file:         successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
Cilium:                  Ok   1.17.15 (v1.17.15-4206eaa5)
NodeMonitor:             Listening for events on 8 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok   
IPAM:                    IPv4: 6/254 allocated from 10.244.0.0/24, 
IPv4 BIG TCP:            Disabled
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled

## 使用 Native 网络策略
Routing:                 Network: Native   Host: Legacy
Attach Mode:             TCX
Device Mode:             veth
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       41/41 healthy
Proxy Status:            OK, ip 10.244.0.26, 0 redirects active on ports 10000-20000, Envoy: external
Global Identity Range:   min 256, max 65535
Hubble:                  Ok              Current/Max Flows: 4095/4095 (100.00%), Flows/s: 58.42   Metrics: Disabled
Encryption:              Disabled        
Cluster health:          2/2 reachable   (2026-05-03T03:11:57Z)
Name                     IP              Node   Endpoints
Modules Health:          Stopped(0) Degraded(0) OK(60)

2.查询 Cilium Endpoint 信息

root@network-demo:~# kubectl exec -it -n kube-system cilium-tjsld -- cilium endpoint list

ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                      IPv4           STATUS   
           ENFORCEMENT        ENFORCEMENT
44         Disabled           Disabled          1          k8s:node-role.kubernetes.io/control-plane                                       ready
                                                           k8s:node.kubernetes.io/exclude-from-external-load-balancers
                                                           reserved:host
224        Disabled           Disabled          4          reserved:health                                                  10.244.0.22    ready
786        Disabled           Disabled          59785      k8s:app=local-path-provisioner                                   10.244.0.213   ready
2112       Disabled           Disabled          6602       k8s:io.cilium.k8s.namespacemetadata.name=kube-system             10.244.0.224   ready
                                                           k8s:io.cilium.k8s.policy.cluster=cilium-kubeproxy
                                                           k8s:io.cilium.k8s.policy.serviceaccount=coredns
                                                           k8s:io.kubernetes.pod.namespace=kube-system
                                                           k8s:k8s-app=kube-dns
2119       Disabled           Disabled          6602       k8s:io.cilium.k8s.namespace/metadata.name=kube-system            10.244.0.165   ready
                                                           k8s:io.cilium.k8s.policy.cluster=cilium-kubeproxy
                                                           k8s:io.cilium.k8s.policy.serviceaccount=coredns
                                                           k8s:io.kubernetes.pod.namespace=kube-system
                                                           k8s:k8s-app=kube-dns
2234       Disabled           Disabled          17248      k8s:io.cilium.k8s.namespace/metadata.name=default                10.244.0.49    ready
                                                           k8s:io.cilium.k8s.policy.cluster=cilium-kubeproxy
                                                           k8s:io.cilium.k8s.policy.serviceaccount=default
                                                           k8s:io.kubernetes.pod.namespace=default
                                                           k8s:run=client

3. 查询 Cilium Service 信息

kubeProxyReplacement=false 时,Cilium 仅启用对 ClusterIP 的集群内负载均衡。

root@network-demo:~# kubectl exec -it -n kube-system cilium-tjsld -- cilium service list

ID   Frontend                Service Type   Backend                               
1    10.96.0.1:443/TCP       ClusterIP      1 => 172.18.0.3:6443/TCP (active)     
2    10.96.238.242:443/TCP   ClusterIP      1 => 172.18.0.3:4244/TCP (active)     
3    10.96.0.10:53/UDP       ClusterIP      1 => 10.244.0.224:53/UDP (active)     
                                            2 => 10.244.0.165:53/UDP (active)     
4    10.96.0.10:53/TCP       ClusterIP      1 => 10.244.0.224:53/TCP (active)     
                                            2 => 10.244.0.165:53/TCP (active)     
5    10.96.0.10:9153/TCP     ClusterIP      1 => 10.244.0.224:9153/TCP (active)   
                                            2 => 10.244.0.165:9153/TCP (active)   
6    10.96.94.94:9494/TCP    ClusterIP      1 => 10.244.0.49:9494/TCP (active)    
7    10.96.94.95:9495/TCP    ClusterIP      1 => 10.244.1.124:9495/TCP (active)

查询 iptables 规则

1.查询部署后 Cilium 使用的 iptables 规则

当前未请求测试 Pod,可以看到对应的 svc 与 pod 规则 pkts 命中次数均为 0

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL PREROUTING
Chain PREROUTING (policy ACCEPT 259 packets, 14196 bytes)
 pkts bytes target          prot opt in    out     source        destination         
   58  3104 CILIUM_PRE_nat  all  --  *     *       0.0.0.0/0     0.0.0.0/0       /* cilium-feeder: CILIUM_PRE_nat */
  262 14426 KUBE-SERVICES   all  --  *     *       0.0.0.0/0     0.0.0.0/0       /* kubernetes service portals */
    2   170 DOCKER_OUTPUT   all  --  *     *       0.0.0.0/0     172.18.0.1          

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SERVICES | grep -E "NODEPORTS|SWRT7AX63WBUEU6W|KCJY4KYQ6LVWE356"
Chain KUBE-SERVICES (2 references)
 pkts bytes target                     prot opt in   ou   source        destination         

    0     0 KUBE-SVC-KCJY4KYQ6LVWE356  tcp  --  *    *    0.0.0.0/0     10.96.94.94   /* default/clientsvc cluster IP */ tcp dpt:9494
    0     0 KUBE-SVC-SWRT7AX63WBUEU6W  tcp  --  *    *    0.0.0.0/0     10.96.94.95   /* default/serversvc cluster IP */ tcp dpt:9494
 1760  105K KUBE-NODEPORTS             all  --  *    *    0.0.0.0/0     0.0.0.0/0     /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SVC-SWRT7AX63WBUEU6W
Chain KUBE-SVC-SWRT7AX63WBUEU6W (2 references)
 pkts bytes target                     prot opt in     out     source            destination         
    0     0 KUBE-MARK-MASQ             tcp  --  *      *      !10.244.0.0/16     10.96.94.95     /* default/serversvc cluster IP */ tcp dpt:9494
    0     0 KUBE-SEP-4EXVZ54ZVZAC4Q5C  all  --  *      *       0.0.0.0/0         0.0.0.0/0       /* default/serversvc -> 10.244.1.124:9494 */

2.通过 k8s node 访问 svc 后查询 iptables 规则

2.1.在 node 通过 Cluster IP + Port 方式访问

两次后,server svc 的命中率增加了 2 次。

root@cilium-kubeproxy-control-plane:/# curl -s 10.96.94.95:9494
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@cilium-kubeproxy-control-plane:/# curl -s 10.96.94.95:9494
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SERVICES | grep 'KUBE-SVC-SWRT7AX63WBUEU6W'
Chain KUBE-SERVICES (2 references)
 pkts bytes target                     prot opt in     out     source       destination
    2   120 KUBE-SVC-SWRT7AX63WBUEU6W  tcp  --  *      *       0.0.0.0/0    10.96.94.95    /* default/serversvc cluster IP */ tcp dpt:9494

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SVC-SWRT7AX63WBUEU6W
Chain KUBE-SVC-SWRT7AX63WBUEU6W (2 references)
 pkts bytes target                     prot opt in     out     source           destination
    2   120 KUBE-MARK-MASQ             tcp  --  *      *      !10.244.0.0/16    10.96.94.95    /* default/serversvc cluster IP */ tcp dpt:9494
    2   120 KUBE-SEP-4EXVZ54ZVZAC4Q5C  all  --  *      *       0.0.0.0/0        0.0.0.0/0      /* default/serversvc -> 10.244.1.124:9494 */

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SEP-4EXVZ54ZVZAC4Q5C
Chain KUBE-SEP-4EXVZ54ZVZAC4Q5C (1 references)
 pkts bytes target          prot opt in     out     source          destination         
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.1.124    0.0.0.0/0    /* default/serversvc */
    2   120 DNAT            tcp  --  *      *       0.0.0.0/0       0.0.0.0/0    /* default/serversvc */ tcp to:10.244.1.124:9494

2.2.在 node 通过 Node IP + svcNodePort 方式访问

只有 svc --> pod 的 iptables 规则增加了 2 次

root@cilium-kubeproxy-control-plane:/# curl -s 172.18.0.3:30495
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@cilium-kubeproxy-control-plane:/# curl -s 172.18.0.3:30495
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SERVICES | grep 'KUBE-SVC-SWRT7AX63WBUEU6W'
Chain KUBE-SERVICES (2 references)
 pkts bytes target                     prot opt in     out     source       destination         
    2   120 KUBE-SVC-SWRT7AX63WBUEU6W  tcp  --  *      *       0.0.0.0/0    10.96.94.95    /* default/serversvc cluster IP */ tcp dpt:9494

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SVC-SWRT7AX63WBUEU6W
Chain KUBE-SVC-SWRT7AX63WBUEU6W (2 references)
 pkts bytes target                     prot opt in     out     source           destination         
    2   120 KUBE-MARK-MASQ             tcp  --  *      *      !10.244.0.0/16    10.96.94.95    /* default/serversvc cluster IP */ tcp dpt:9494
    4   240 KUBE-SEP-4EXVZ54ZVZAC4Q5C  all  --  *      *       0.0.0.0/0        0.0.0.0/0      /* default/serversvc -> 10.244.1.124:9494 */

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SEP-4EXVZ54ZVZAC4Q5C
Chain KUBE-SEP-4EXVZ54ZVZAC4Q5C (1 references)
 pkts bytes target          prot opt in     out     source          destination         
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.1.124    0.0.0.0/0    /* default/serversvc */
    4   240 DNAT            tcp  --  *      *       0.0.0.0/0       0.0.0.0/0    /* default/serversvc */ tcp to:10.244.1.124:9494

3.通过 k8s pod 访问 svc 后查询 iptables 规则

3.1.在 Client Pod 通过 Node IP + NodePort

本节点 iptables 命中次数依然增加

💡 请求其他 Node IP 时应该看对应 IP 节点的 iptables 规则,不然命中次数肯定不会增加 😂,这里可以通过 KUBE-SERVICES 链中 target: KUBE-NODEPORTS 注释 NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL 看到,只有目标是本机 IP 时,才进入 NodePort 处理链。

root@cilium-kubeproxy-control-plane:/# kubectl exec -it client -- curl -s 172.18.0.3:30495
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@cilium-kubeproxy-control-plane:/# kubectl exec -it client -- curl -s 172.18.0.3:30495
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
 pkts bytes target                     prot opt in  out    source       destination         
    2   120 KUBE-SVC-SWRT7AX63WBUEU6W  tcp  --   *    *    0.0.0.0/0    10.96.94.95    /* default/serversvc cluster IP */ tcp dpt:9494

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SVC-SWRT7AX63WBUEU6W
Chain KUBE-SVC-SWRT7AX63WBUEU6W (2 references)
 pkts bytes target                     prot opt in     out     source           destination         
    2   120 KUBE-MARK-MASQ             tcp  --  *      *      !10.244.0.0/16    10.96.94.95    /* default/serversvc cluster IP */ tcp dpt:9494
    6   360 KUBE-SEP-4EXVZ54ZVZAC4Q5C  all  --  *      *       0.0.0.0/0        0.0.0.0/0      /* default/serversvc -> 10.244.1.124:9494 */

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SEP-4EXVZ54ZVZAC4Q5C
Chain KUBE-SEP-4EXVZ54ZVZAC4Q5C (1 references)
 pkts bytes target          prot opt in     out     source          destination         
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.1.124    0.0.0.0/0    /* default/serversvc */
    6   360 DNAT            tcp  --  *      *       0.0.0.0/0       0.0.0.0/0    /* default/serversvc */ tcp to:10.244.1.124:9494

3.2.在 Client Pod 通过 Cluster IP + Port 方式访问

此时会发现,无论访问多少次,iptables 命中数都不会增加,这就是官网文档最后提到的如果不使用 Cilium 代替 kube-proxy,则只会启用 ClusterIP services 的负载:

By default, Helm sets kubeProxyReplacement=false, which only enables per-packet in-cluster load-balancing of ClusterIP services.

root@network-demo:~# kubectl get svc serversvc
NAME        TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
serversvc   NodePort   10.96.94.95   <none>        9494:30495/TCP   139m

root@network-demo:~# kubectl exec -it client -- curl -s 10.96.94.95:9494
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@network-demo:~# kubectl exec -it client -- curl -s 10.96.94.95:9494
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
 pkts bytes target                     prot opt in  out    source       destination         
    2   120 KUBE-SVC-SWRT7AX63WBUEU6W  tcp  --   *    *    0.0.0.0/0    10.96.94.95    /* default/serversvc cluster IP */ tcp dpt:9494

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SVC-SWRT7AX63WBUEU6W
Chain KUBE-SVC-SWRT7AX63WBUEU6W (2 references)
 pkts bytes target                     prot opt in     out     source           destination         
    2   120 KUBE-MARK-MASQ             tcp  --  *      *      !10.244.0.0/16    10.96.94.95    /* default/serversvc cluster IP */ tcp dpt:9494
    6   360 KUBE-SEP-4EXVZ54ZVZAC4Q5C  all  --  *      *       0.0.0.0/0        0.0.0.0/0      /* default/serversvc -> 10.244.1.124:9494 */

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SEP-4EXVZ54ZVZAC4Q5C
Chain KUBE-SEP-4EXVZ54ZVZAC4Q5C (1 references)
 pkts bytes target          prot opt in     out     source          destination         
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.1.124    0.0.0.0/0    /* default/serversvc */
    6   360 DNAT            tcp  --  *      *       0.0.0.0/0       0.0.0.0/0    /* default/serversvc */ tcp to:10.244.1.124:9494

Pod 网卡处抓包

root@network-demo:~# kubectl get pods -o wide
NAME     READY   STATUS    RESTARTS   AGE    IP             NODE
client   1/1     Running   0          145m   10.244.0.49    cilium-kubeproxy-control-plane
server   1/1     Running   0          145m   10.244.1.124   cilium-kubeproxy-worker

1.请求 NodePort 查看效果

可以看出,Server IP 并没有被 Cilium 更改,与常规的 CNI 网络一样。

root@network-demo:~# kubectl exec -it client -- curl -s 172.18.0.3:30495
PodName: server | PodIP: eth0 10.244.1.124/32 eth0 fe80::d8d4:71ff:fe3d:9e3a/64
root@network-demo:~# kubectl exec -it client -- tcpdump -pnei eth0

05:34:25.576890 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 74: 10.244.0.49.43868 > 172.18.0.3.30495: Flags [S], seq 4266727014, win 64240, options [mss 1460,sackOK,TS val 1585845161 ecr 0,nop,wscale 7], length 0
05:34:25.577154 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 74: 172.18.0.3.30495 > 10.244.0.49.43868: Flags [S.], seq 2053570008, ack 4266727015, win 65160, options [mss 1460,sackOK,TS val 289666935 ecr 1585845161,nop,wscale 7], length 0
05:34:25.577166 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.43868 > 172.18.0.3.30495: Flags [.], ack 1, win 502, options [nop,nop,TS val 1585845162 ecr 289666935], length 0
05:34:25.577256 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 146: 10.244.0.49.43868 > 172.18.0.3.30495: Flags [P.], seq 1:81, ack 1, win 502, options [nop,nop,TS val 1585845162 ecr 289666935], length 80
05:34:25.577331 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 66: 172.18.0.3.30495 > 10.244.0.49.43868: Flags [.], ack 81, win 509, options [nop,nop,TS val 289666935 ecr 1585845162], length 0
05:34:25.577498 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 302: 172.18.0.3.30495 > 10.244.0.49.43868: Flags [P.], seq 1:237, ack 81, win 509, options [nop,nop,TS val 289666935 ecr 1585845162], length 236
05:34:25.577519 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.43868 > 172.18.0.3.30495: Flags [.], ack 237, win 501, options [nop,nop,TS val 1585845162 ecr 289666935], length 0
05:34:25.577750 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 146: 172.18.0.3.30495 > 10.244.0.49.43868: Flags [P.], seq 237:317, ack 81, win 509, options [nop,nop,TS val 289666935 ecr 1585845162], length 80
05:34:25.577763 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.43868 > 172.18.0.3.30495: Flags [.], ack 317, win 501, options [nop,nop,TS val 1585845162 ecr 289666935], length 0
05:34:25.577905 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.43868 > 172.18.0.3.30495: Flags [F.], seq 81, ack 317, win 501, options [nop,nop,TS val 1585845162 ecr 289666935], length 0
05:34:25.578122 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 66: 172.18.0.3.30495 > 10.244.0.49.43868: Flags [F.], seq 317, ack 82, win 509, options [nop,nop,TS val 289666935 ecr 1585845162], length 0
05:34:25.578131 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.43868 > 172.18.0.3.30495: Flags [.], ack 318, win 501, options [nop,nop,TS val 1585845163 ecr 289666935], length 0

2.请求 ClusterIP 查看效果

需要先明确一点:包不会在同一个接口上既走 ingress 又走 egress(除非是 lo 回环接口)。每个接口只负责一个方向

实际上 lxc 和宿主机 eth0 间通过内核 IP 层直连。内核做完路由决策后,直接把包放到 eth0 的 egress 路径上,不存在 ingress 步骤

发现无论在 Pod eth0 还是 Pod veth pair 网卡 lxceaefdfbf8f50 抓包,dst ip 并没有被 Cilium 更改为 Pod IP,还是 svc ClusterIP。

可以对比文章开头的网络分层图来看,因为 DNAT 的处理是在 Pod eth0 veth pair 上做的,也就是 node 节点 lxceaefdfbf8f50 设备,这里可以通过下方代码块 cilium monitor 证明。而对于 lxceaefdfbf8f50 设备来说,这一个 Ingress 入站请求,tcpdump 在 tc 之前,所以看到的是原始 dst ip:

  • --> Netdevice/Drivers(tcpdump) --> Traffic Shaping(tc) --> IP 层查内核路由表

而到了 IP 层看到的 dst ip 是被 tc 更改后的 Pod IP。lxceaefdfbf8f50 和 eth0 都在同一个内核里,内核路由表决定转发时,是内核内部的操作,直接把包从 IP 层放到 eth0 的 egress 路径上。可以把内核类比为交换机:lxceaefdfbf8f50 收到包后交给内核交换机,交换机查路由表把包送到宿主机 eth0 的 egress 出口:

  • --> Netdevice/Drivers(tcpdump) --> Traffic Shaping(tc) --> IP 层查内核路由表 --> 宿主机 eth0

此时压根没从 Netdevice/Drivers(tcpdump) 处出去,自然也就抓不到被 tc 更改后的 Pod IP。不过可以通过宿主机 eth0 抓包验证 tc 操作

root@network-demo:~# docker exec -it cilium-kubeproxy-control-plane tcpdump -pnei lxceaefdfbf8f50

05:50:26.215635 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 74: 10.244.0.49.57444 > 10.96.94.95.9494: Flags [S], seq 1259547284, win 64240, options [mss 1460,sackOK,TS val 1736199449 ecr 0,nop,wscale 7], length 0
05:50:26.215856 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 74: 10.96.94.95.9494 > 10.244.0.49.57444: Flags [S.], seq 2548819030, ack 1259547285, win 65160, options [mss 1460,sackOK,TS val 3988848119 ecr 1736199449,nop,wscale 7], length 0
05:50:26.215872 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.57444 > 10.96.94.95.9494: Flags [.], ack 1, win 502, options [nop,nop,TS val 1736199449 ecr 3988848119], length 0
05:50:26.215955 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 146: 10.244.0.49.57444 > 10.96.94.95.9494: Flags [P.], seq 1:81, ack 1, win 502, options [nop,nop,TS val 1736199449 ecr 3988848119], length 80
05:50:26.216071 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 66: 10.96.94.95.9494 > 10.244.0.49.57444: Flags [.], ack 81, win 509, options [nop,nop,TS val 3988848119 ecr 1736199449], length 0
05:50:26.216212 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 302: 10.96.94.95.9494 > 10.244.0.49.57444: Flags [P.], seq 1:237, ack 81, win 509, options [nop,nop,TS val 3988848120 ecr 1736199449], length 236
05:50:26.216234 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.57444 > 10.96.94.95.9494: Flags [.], ack 237, win 501, options [nop,nop,TS val 1736199450 ecr 3988848120], length 0
05:50:26.216361 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 146: 10.96.94.95.9494 > 10.244.0.49.57444: Flags [P.], seq 237:317, ack 81, win 509, options [nop,nop,TS val 3988848120 ecr 1736199450], length 80
05:50:26.216374 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.57444 > 10.96.94.95.9494: Flags [.], ack 317, win 501, options [nop,nop,TS val 1736199450 ecr 3988848120], length 0
05:50:26.216660 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.57444 > 10.96.94.95.9494: Flags [F.], seq 81, ack 317, win 501, options [nop,nop,TS val 1736199450 ecr 3988848120], length 0
05:50:26.216928 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype IPv4 (0x0800), length 66: 10.96.94.95.9494 > 10.244.0.49.57444: Flags [F.], seq 317, ack 82, win 509, options [nop,nop,TS val 3988848120 ecr 1736199450], length 0
05:50:26.216945 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype IPv4 (0x0800), length 66: 10.244.0.49.57444 > 10.96.94.95.9494: Flags [.], ack 318, win 501, options [nop,nop,TS val 1736199450 ecr 3988848120], length 0

在 Client Pod 节点对应的 Cilium Pod 中通过 cilium monitor 验证 tc BPF 效果:

tc 将 svc ip 改为 pod ip,到达 iptables 规则时自然无法命中

root@network-demo:~# kubectl exec -it -n kube-system cilium-tjsld -- cilium monitor --type trace --from 2234 -v

time="2026-05-03T06:18:55.085420215Z" level=info msg="Initializing dissection cache..." subsys=monitor
<- endpoint 2234 flow 0x2add59e2 , identity 17248->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.96.94.95:9494 tcp SYN

## 此处由 17248->unknown 变为 17248->64959,到达 iptables 时已经不是 ClusterIP 了
-> stack flow 0x2add59e2 , identity 17248->64959 state new ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.244.1.124:9494 tcp SYN
-> endpoint 2234 flow 0x90110859 , identity 64959->17248 state reply ifindex lxceaefdfbf8f50 orig-ip 10.244.1.124: 10.96.94.95:9494 -> 10.244.0.49:56798 tcp SYN, ACK
<- endpoint 2234 flow 0x2add59e2 , identity 17248->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.96.94.95:9494 tcp ACK
-> stack flow 0x2add59e2 , identity 17248->64959 state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.244.1.124:9494 tcp ACK
<- endpoint 2234 flow 0x2add59e2 , identity 17248->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.96.94.95:9494 tcp ACK
-> stack flow 0x2add59e2 , identity 17248->64959 state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.244.1.124:9494 tcp ACK
-> endpoint 2234 flow 0x90110859 , identity 64959->17248 state reply ifindex lxceaefdfbf8f50 orig-ip 10.244.1.124: 10.96.94.95:9494 -> 10.244.0.49:56798 tcp ACK
-> endpoint 2234 flow 0x90110859 , identity 64959->17248 state reply ifindex lxceaefdfbf8f50 orig-ip 10.244.1.124: 10.96.94.95:9494 -> 10.244.0.49:56798 tcp ACK
<- endpoint 2234 flow 0x2add59e2 , identity 17248->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.96.94.95:9494 tcp ACK
-> stack flow 0x2add59e2 , identity 17248->64959 state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.244.1.124:9494 tcp ACK
-> endpoint 2234 flow 0x90110859 , identity 64959->17248 state reply ifindex lxceaefdfbf8f50 orig-ip 10.244.1.124: 10.96.94.95:9494 -> 10.244.0.49:56798 tcp ACK
<- endpoint 2234 flow 0x2add59e2 , identity 17248->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.96.94.95:9494 tcp ACK
-> stack flow 0x2add59e2 , identity 17248->64959 state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.244.1.124:9494 tcp ACK
<- endpoint 2234 flow 0x2add59e2 , identity 17248->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.96.94.95:9494 tcp ACK, FIN
-> stack flow 0x2add59e2 , identity 17248->64959 state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.244.1.124:9494 tcp ACK, FIN
-> endpoint 2234 flow 0x90110859 , identity 64959->17248 state reply ifindex lxceaefdfbf8f50 orig-ip 10.244.1.124: 10.96.94.95:9494 -> 10.244.0.49:56798 tcp ACK, FIN
<- endpoint 2234 flow 0x2add59e2 , identity 17248->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.96.94.95:9494 tcp ACK
-> stack flow 0x2add59e2 , identity 17248->64959 state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.49:56798 -> 10.244.1.124:9494 tcp ACK

Node 网卡处抓包

1.Client Pod 请求 ClusterIP 时在 Node eth0 抓包

上一流程中,在 Client Pod 处请求 Server Pod ClusterIP,发现无论是 Client Pod eth0 还是 veth pair 设备抓包,看到的结果都是 Client Pod IP --> ClusterIP,此时在 Client Pod 宿主机 eth0 处抓包,就可以发现 dst ip 已经是被 tc 更改后的 Pod IP 了:

root@network-demo:~# docker exec -it cilium-kubeproxy-control-plane tcpdump -pnei eth0 host 10.244.1.124

08:36:15.371232 32:29:77:be:87:c6 > ae:d7:63:54:c6:7d, ethertype IPv4 (0x0800), length 74: 10.244.0.49.40014 > 10.244.1.124.9494: Flags [S], seq 949473642, win 64240, options [mss 1460,sackOK,TS val 1746148605 ecr 0,nop,wscale 7], length 0
08:36:15.371346 ae:d7:63:54:c6:7d > 32:29:77:be:87:c6, ethertype IPv4 (0x0800), length 74: 10.244.1.124.9494 > 10.244.0.49.40014: Flags [S.], seq 1377165028, ack 949473643, win 65160, options [mss 1460,sackOK,TS val 3998797275 ecr 1746148605,nop,wscale 7], length 0
08:36:15.371400 32:29:77:be:87:c6 > ae:d7:63:54:c6:7d, ethertype IPv4 (0x0800), length 66: 10.244.0.49.40014 > 10.244.1.124.9494: Flags [.], ack 1, win 502, options [nop,nop,TS val 1746148605 ecr 3998797275], length 0
08:36:15.371559 32:29:77:be:87:c6 > ae:d7:63:54:c6:7d, ethertype IPv4 (0x0800), length 146: 10.244.0.49.40014 > 10.244.1.124.9494: Flags [P.], seq 1:81, ack 1, win 502, options [nop,nop,TS val 1746148605 ecr 3998797275], length 80
08:36:15.371611 ae:d7:63:54:c6:7d > 32:29:77:be:87:c6, ethertype IPv4 (0x0800), length 66: 10.244.1.124.9494 > 10.244.0.49.40014: Flags [.], ack 81, win 509, options [nop,nop,TS val 3998797275 ecr 1746148605], length 0
08:36:15.371793 ae:d7:63:54:c6:7d > 32:29:77:be:87:c6, ethertype IPv4 (0x0800), length 302: 10.244.1.124.9494 > 10.244.0.49.40014: Flags [P.], seq 1:237, ack 81, win 509, options [nop,nop,TS val 3998797275 ecr 1746148605], length 236
08:36:15.371885 32:29:77:be:87:c6 > ae:d7:63:54:c6:7d, ethertype IPv4 (0x0800), length 66: 10.244.0.49.40014 > 10.244.1.124.9494: Flags [.], ack 237, win 501, options [nop,nop,TS val 1746148605 ecr 3998797275], length 0
08:36:15.371983 ae:d7:63:54:c6:7d > 32:29:77:be:87:c6, ethertype IPv4 (0x0800), length 146: 10.244.1.124.9494 > 10.244.0.49.40014: Flags [P.], seq 237:317, ack 81, win 509, options [nop,nop,TS val 3998797275 ecr 1746148605], length 80
08:36:15.372035 32:29:77:be:87:c6 > ae:d7:63:54:c6:7d, ethertype IPv4 (0x0800), length 66: 10.244.0.49.40014 > 10.244.1.124.9494: Flags [.], ack 317, win 501, options [nop,nop,TS val 1746148605 ecr 3998797275], length 0
08:36:15.372367 32:29:77:be:87:c6 > ae:d7:63:54:c6:7d, ethertype IPv4 (0x0800), length 66: 10.244.0.49.40014 > 10.244.1.124.9494: Flags [F.], seq 81, ack 317, win 501, options [nop,nop,TS val 1746148606 ecr 3998797275], length 0
08:36:15.372539 ae:d7:63:54:c6:7d > 32:29:77:be:87:c6, ethertype IPv4 (0x0800), length 66: 10.244.1.124.9494 > 10.244.0.49.40014: Flags [F.], seq 317, ack 82, win 509, options [nop,nop,TS val 3998797276 ecr 1746148606], length 0
08:36:15.372618 32:29:77:be:87:c6 > ae:d7:63:54:c6:7d, ethertype IPv4 (0x0800), length 66: 10.244.0.49.40014 > 10.244.1.124.9494: Flags [.], ack 318, win 501, options [nop,nop,TS val 1746148606 ecr 3998797276], length 0

2.Node 节点请求 ClusterIP 时在 cilium_host 处抓包

在 k8s node 中通过 curl 请求 ClusterIP,通过节点内核路由表发现,如果被 iptables 规则负载到本地 Pod,那就是通过本地 cilium_host 设备进行转发。

同时在本地 Server Pod eth0 与 cilium_host 抓包:

root@cilium-kubeproxy-control-plane:/# curl -s 10.96.170.11
PodName: pod-1 | PodIP: eth0 10.244.0.215/32

Node cilium_host 设备处抓包,发现什么数据都没有:

root@cilium-kubeproxy-control-plane:/# ip address show cilium_host
5: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 12:19:e0:82:77:96 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.26/32 scope global cilium_host
       valid_lft forever preferred_lft forever

root@cilium-kubeproxy-control-plane:/# tcpdump -pnei cilium_host

## 空的

在 Server Pod eth0 设备处抓包,发现 Client IP 是 cilium_host 设备 IP:

root@cilium-kubeproxy-control-plane:/# kubectl exec -it pod-1 -- tcpdump -pnei eth0

02:59:47.275584 82:b7:7f:40:b0:59 > f2:ee:5b:48:5c:d5, ethertype IPv4 (0x0800), length 74: 10.244.0.26.28907 > 10.244.0.215.80: Flags [S], seq 3859345334, win 64240, options [mss 1460,sackOK,TS val 4144267874 ecr 0,nop,wscale 7], length 0
02:59:47.275600 f2:ee:5b:48:5c:d5 > 82:b7:7f:40:b0:59, ethertype IPv4 (0x0800), length 74: 10.244.0.215.80 > 10.244.0.26.28907: Flags [S.], seq 936865112, ack 3859345335, win 65160, options [mss 1460,sackOK,TS val 704717770 ecr 4144267874,nop,wscale 7], length 0
02:59:47.275657 82:b7:7f:40:b0:59 > f2:ee:5b:48:5c:d5, ethertype IPv4 (0x0800), length 66: 10.244.0.26.28907 > 10.244.0.215.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 4144267874 ecr 704717770], length 0
02:59:47.275732 82:b7:7f:40:b0:59 > f2:ee:5b:48:5c:d5, ethertype IPv4 (0x0800), length 142: 10.244.0.26.28907 > 10.244.0.215.80: Flags [P.], seq 1:77, ack 1, win 502, options [nop,nop,TS val 4144267874 ecr 704717770], length 76: HTTP: GET / HTTP/1.1
02:59:47.275751 f2:ee:5b:48:5c:d5 > 82:b7:7f:40:b0:59, ethertype IPv4 (0x0800), length 66: 10.244.0.215.80 > 10.244.0.26.28907: Flags [.], ack 77, win 509, options [nop,nop,TS val 704717770 ecr 4144267874], length 0
02:59:47.275859 f2:ee:5b:48:5c:d5 > 82:b7:7f:40:b0:59, ethertype IPv4 (0x0800), length 302: 10.244.0.215.80 > 10.244.0.26.28907: Flags [P.], seq 1:237, ack 77, win 509, options [nop,nop,TS val 704717770 ecr 4144267874], length 236: HTTP: HTTP/1.1 200 OK
02:59:47.275912 82:b7:7f:40:b0:59 > f2:ee:5b:48:5c:d5, ethertype IPv4 (0x0800), length 66: 10.244.0.26.28907 > 10.244.0.215.80: Flags [.], ack 237, win 501, options [nop,nop,TS val 4144267874 ecr 704717770], length 0
02:59:47.275928 f2:ee:5b:48:5c:d5 > 82:b7:7f:40:b0:59, ethertype IPv4 (0x0800), length 111: 10.244.0.215.80 > 10.244.0.26.28907: Flags [P.], seq 237:282, ack 77, win 509, options [nop,nop,TS val 704717770 ecr 4144267874], length 45: HTTP
02:59:47.275966 82:b7:7f:40:b0:59 > f2:ee:5b:48:5c:d5, ethertype IPv4 (0x0800), length 66: 10.244.0.26.28907 > 10.244.0.215.80: Flags [.], ack 282, win 501, options [nop,nop,TS val 4144267874 ecr 704717770], length 0
02:59:47.276280 82:b7:7f:40:b0:59 > f2:ee:5b:48:5c:d5, ethertype IPv4 (0x0800), length 66: 10.244.0.26.28907 > 10.244.0.215.80: Flags [F.], seq 77, ack 282, win 501, options [nop,nop,TS val 4144267875 ecr 704717770], length 0
02:59:47.276375 f2:ee:5b:48:5c:d5 > 82:b7:7f:40:b0:59, ethertype IPv4 (0x0800), length 66: 10.244.0.215.80 > 10.244.0.26.28907: Flags [F.], seq 282, ack 78, win 509, options [nop,nop,TS val 704717771 ecr 4144267875], length 0
02:59:47.276440 82:b7:7f:40:b0:59 > f2:ee:5b:48:5c:d5, ethertype IPv4 (0x0800), length 66: 10.244.0.26.28907 > 10.244.0.215.80: Flags [.], ack 283, win 501, options [nop,nop,TS val 4144267875 ecr 704717771], length 0

其实具体原因与上文在 Client Pod 处请求 ClusterIP 逻辑差不多,从下面代码块中路由走向可以看出:

  1. 请求 ClusterIP 时,数据包会从 eth0 发出,交给网关 172.18.0.1 转发,并用 172.18.0.3 作为源 IP;

    1.1 其实也没走 eth0,对比文章开头分层图来看,已经被 iptables 将 ClusterIP 10.96.170.11 转换为 Pod IP 10.244.0.215 了。

  2. 如果请求 ClusterIP 分配到了本机 Pod IP 10.244.0.215,那就从 cilium_host 网卡发出,交给网关 10.244.0.26 用 10.244.0.26 作为 源 IP,但其实网关/源 IP 都是 cilium_host;

  3. 在 ip route show 中看到路由到 Pod 10.244.0.215 是一条三层路由信息,因为 10.244.0.26 就是 cilium_host 自身的 IP,内核在 route get 做路由查找时发现下一跳是本机接口,直接简化为 "通过 cilium_host 发出" 了。

📝 重要总结

  • Client Pod 访问 ClusterIP 时,veth pair lxc 可以抓到流量,是因为:抓取的 Client Pod 过来的 Ingress 入站流量。所以 dst ip 仍然是 ClusterIP;
  • k8s Node 访问 ClusterIP 时,cilium_host 抓取时什么也没有是因为:请求通过路由时发现 via 10.244.0.26 dev cilium_host 路由中的 via ip 就是自己,所以省略了转发的这一步,直接进入到了 cilium_host egress 出站这一步。而在这步中 cilium_host 上的 eBPF 程序实际上已经提前把 MAC 改好了,用 bpf_redirect() 函数送到 Server Pod lxc 中,跳过了 tcpdump 的 AF_PACKET 捕获点,所以抓不到流量。
## 1.
## 因为本环境是通过 kind + docker 形式部署的,所以此处 via 是 docker 生成的 bridge 设备地址
## 正常来说应该是本机上的某个设备
root@cilium-kubeproxy-control-plane:/# ip route get 10.96.170.11
10.96.170.11 via 172.18.0.1 dev eth0 src 172.18.0.3 uid 0
    cache

## 1.1
root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SERVICES | grep 'KUBE-SVC-PQCIGBIECMLVBHFY'
pkts bytes target                     prot opt in     out    source       destination
   5   300 KUBE-SVC-PQCIGBIECMLVBHFY  tcp  --  *      *      0.0.0.0/0    10.96.170.11    /* default/pod:http cluster IP */ tcp dpt:80

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SVC-PQCIGBIECMLVBHFY
pkts bytes target                     prot  opt  in  out    source            destination
   18  1080 KUBE-MARK-MASQ             tcp  --   *   *      !10.244.0.0/16    10.96.170.11    /* default/pod:http cluster IP */ tcp dpt:80
    7   420 KUBE-SEP-2ZKMHUCYGIQ55D4S  all  --   *   *      0.0.0.0/0         0.0.0.0/0       /* default/pod:http -> 10.244.0.215:80 */ statistic mode random probability 0.50000000000
   14   840 KUBE-SEP-7D54XOF4R7QCT7KQ  all  --   *   *      0.0.0.0/0         0.0.0.0/0       /* default/pod:http -> 10.244.1.51:80 */

root@cilium-kubeproxy-control-plane:/# iptables -t nat -nvL KUBE-SEP-2ZKMHUCYGIQ55D4S
Chain KUBE-SEP-2ZKMHUCYGIQ55D4S (1 references)
 pkts bytes target          prot opt in  out   source         destination         
    0     0 KUBE-MARK-MASQ  all  --  *   *     10.244.0.215   0.0.0.0/0      /* default/pod:http */
    7   420 DNAT            tcp  --  *   *     0.0.0.0/0      0.0.0.0/0      /* default/pod:http */ tcp to:10.244.0.215:80

## 2.
root@cilium-kubeproxy-control-plane:/# ip route show | grep '10.244.0.0'
10.244.0.0/24 via 10.244.0.26 dev cilium_host proto kernel src 10.244.0.26

## 3.
root@cilium-kubeproxy-control-plane:/# ip route get 10.244.0.215
10.244.0.215 dev cilium_host src 10.244.0.26 uid 0
    cache

通过同节点 cilium pod 查询 monitor 信息,可以看到具体的转换过程

root@cilium-kubeproxy-control-plane:/# bpftool net show | grep 'cilium_host'
xdp:

tc:
cilium_host(5) tcx/ingress cil_to_host prog_id 5346 link_id 130 
cilium_host(5) tcx/egress cil_from_host prog_id 5349 link_id 131 

flow_dissector:

netfilter:
root@cilium-kubeproxy-control-plane:/# kubectl exec -it -n kube-system cilium-tjsld -- cilium monitor --type trace -v --from 1693

## 这里 identity host 已经可以看到是从主机过来的
## orig-ip 10.244.0.26
## cilium_host 通过 BPF 程序 cil_from_host redirect 到 Server Pod lxc490b726473cb 上
## 具体的转发逻辑在下一代码块中验证
-> endpoint 1693 flow 0xaa48e037 , identity host->30501 state new ifindex lxc490b726473cb orig-ip 10.244.0.26: 10.244.0.26:37056 -> 10.244.0.215:80 tcp SYN
<- endpoint 1693 flow 0x5d0bfe02 , identity 30501->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp SYN, ACK
-> stack flow 0x5d0bfe02 , identity 30501->host state reply ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp SYN, ACK
-> endpoint 1693 flow 0xaa48e037 , identity host->30501 state established ifindex lxc490b726473cb orig-ip 10.244.0.26: 10.244.0.26:37056 -> 10.244.0.215:80 tcp ACK
-> endpoint 1693 flow 0xaa48e037 , identity host->30501 state established ifindex lxc490b726473cb orig-ip 10.244.0.26: 10.244.0.26:37056 -> 10.244.0.215:80 tcp ACK
<- endpoint 1693 flow 0x5d0bfe02 , identity 30501->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp ACK
-> stack flow 0x5d0bfe02 , identity 30501->host state reply ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp ACK
<- endpoint 1693 flow 0x5d0bfe02 , identity 30501->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp ACK
-> stack flow 0x5d0bfe02 , identity 30501->host state reply ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp ACK
-> endpoint 1693 flow 0xaa48e037 , identity host->30501 state established ifindex lxc490b726473cb orig-ip 10.244.0.26: 10.244.0.26:37056 -> 10.244.0.215:80 tcp ACK
<- endpoint 1693 flow 0x5d0bfe02 , identity 30501->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp ACK
-> stack flow 0x5d0bfe02 , identity 30501->host state reply ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp ACK
-> endpoint 1693 flow 0xaa48e037 , identity host->30501 state established ifindex lxc490b726473cb orig-ip 10.244.0.26: 10.244.0.26:37056 -> 10.244.0.215:80 tcp ACK
-> endpoint 1693 flow 0xaa48e037 , identity host->30501 state established ifindex lxc490b726473cb orig-ip 10.244.0.26: 10.244.0.26:37056 -> 10.244.0.215:80 tcp ACK, FIN
<- endpoint 1693 flow 0x5d0bfe02 , identity 30501->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp ACK, FIN
-> stack flow 0x5d0bfe02 , identity 30501->host state reply ifindex 0 orig-ip 0.0.0.0: 10.244.0.215:80 -> 10.244.0.26:37056 tcp ACK, FIN
-> endpoint 1693 flow 0xaa48e037 , identity host->30501 state established ifindex lxc490b726473cb orig-ip 10.244.0.26: 10.244.0.26:37056 -> 10.244.0.215:80 tcp ACK

上一代码块中描述提到的 redirect 并非 bpf_redirect_peer() 函数,而是 bpf_redirect()

cilium_host 通过 BPF 程序 cil_from_host redirect 到 Server Pod lxc490b726473cb 上

通过上面 bpftool net show 看到 cilium_host egress cil_from_host eBPF 程序在内核中的 ID 为 5349,以人类可读的反汇编格式,转储(dump)出它的 eBPF 虚拟机指令

从输出结果来看,cil_from_host 程序中没有直接做 redirect,而是通过 bpf_tail_call 跳转到另一个 BPF 程序去做的:

## bpftool net show 输出:
cilium_host(5) tcx/egress cil_from_host prog_id 5349 link_id 131


root@cilium-kubeproxy-control-plane:/# bpftool prog dump xlated id 5349

; tail_call_static(ctx, CALLS_MAP, index);
 368: (bf) r1 = r6
 369: (18) r2 = map[id:1169]       # ← CALLS_MAP: 1169
 371: (b7) r3 = 1                  # ← 跳转到 index 1 的程序
 372: (85) call bpf_tail_call#12
 373: (b4) w7 = 2
 374: (05) goto pc+44

; tail_call_static(ctx, CALLS_MAP, index);
 468: (bf) r1 = r6
 469: (18) r2 = map[id:1169]       # CALLS_MAP: 1169
 471: (b7) r3 = 22                 # 跳转到 index 22 的程序
 472: (85) call bpf_tail_call#12
 473: (b4) w1 = 79560960

查看 CALLS_MAP 1169 里的程序列表:

 root@cilium-kubeproxy-control-plane:/# bpftool map dump id 1169
  key: 01 00 00 00  value: df 14 00 00
  key: 07 00 00 00  value: e0 14 00 00
  key: 16 00 00 00  value: e4 14 00 00

根据十进制方式计算,e4 14 00 00 = 0x000014e4 = 1×4096 + 4×256 + 14×16 + 4 = 5348

上面两个 value 按照此方式验证后发现都不是,精简后只留下正确结果

从查询结果来看,5348 又做了一次 tail call 跳转到策略程序 1039:

root@cilium-kubeproxy-control-plane:/# bpftool prog dump xlated id 5348

; tail_call(ctx, map, slot);
 245: (bf) r1 = r6
 246: (18) r2 = map[id:1039]         # ← 策略程序 map
 248: (85) call bpf_tail_call#12
 249: (b4) w0 = -203

查看 CALLS_MAP 1039 里的程序列表,发现 key 对应的 9d 06 00 00 = 0x0000069d = 6×256 + 9×16 + 13 = 1693 是 pod-1 的 endpoint ID

root@cilium-kubeproxy-control-plane:/# bpftool map dump id 1039
  key: 2c 00 00 00  value: e3 14 00 00
  key: e0 00 00 00  value: 04 15 00 00
  key: 12 03 00 00  value: 25 15 00 00
  key: 9d 06 00 00  value: e2 15 00 00
  key: 40 08 00 00  value: 32 15 00 00
  key: 47 08 00 00  value: 1a 15 00 00
  key: ba 08 00 00  value: 59 15 00 00


root@network-demo:~# kubectl exec -it -n kube-system cilium-tjsld -- cilium endpoint list | grep 1693
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])   IPv4           STATUS   
           ENFORCEMENT        ENFORCEMENT
1693       Disabled           Disabled          30501      k8s:app=nginx                 10.244.0.215   ready

查询 key: 9d 06 00 00 对应的 value: e2 15 00 00 = 0x000015e2 = 1×4096 + 5×256 + 14×16 + 2 = 5602,是一个 sched_cls 类型的 BPF 程序,负责 endpoint 1693(pod-1)的策略检查和最终 bpf_redirect 转发:

root@cilium-kubeproxy-control-plane:/# bpftool prog dump xlated id 5602

; return redirect(ifindex, flags);
2005: (b4) w2 = 0
2006: (85) call bpf_redirect#12800944

root@cilium-kubeproxy-control-plane:/# bpftool prog show id 5602
5602: sched_cls  name handle_policy  tag 1a9b43b8794e80f1  gpl
        loaded_at 2026-05-03T12:11:28+0000  uid 0
        xlated 17840B  jited 10602B  memlock 20480B  map_ids 1229,1033,1227,1041,1042,1030,1228,1038,1032,1007,1027,1028,1029,1226,1043
        btf_id 1858

分析 Pod/Node 路由表、ARP 表

1.查看 Pod 路由表

从 Pod 路由表看出,所有出去的流量都要走 cilium_host 这个网关,但实则不然:因为 Pod eth0 和 host 的 lxc 是通过 veth pair 物理直连的,Pod 出去的包只能走这一条路。可以通过下方步骤查询 ARP 表验证:

root@network-demo:~# kubectl exec -it client -- ip route show
default via 10.244.0.26 dev eth0 mtu 1500 
10.244.0.26 dev eth0 scope link 

root@network-demo:~# docker exec -it cilium-kubeproxy-control-plane ip address show cilium_host
5: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 12:19:e0:82:77:96 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.26/32 scope global cilium_host
       valid_lft forever preferred_lft forever

root@network-demo:~# docker exec -it cilium-kubeproxy-control-plane ip -d link show cilium_host
5: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 12:19:e0:82:77:96 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 
    veth addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535

2.查询 Pod ARP 表

可以看出,网关 IP:10.244.0.26 对应的 MAC 地址变成了 veth pair lxc 的 MAC 地址

root@network-demo:~# docker exec -it cilium-kubeproxy-control-plane ip address show lxceaefdfbf8f50
15: lxceaefdfbf8f50@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ea:b4:11:10:09:5b brd ff:ff:ff:ff:ff:ff link-netns cni-6afc5666-963b-3b06-7209-3a1ef6c28288
root@network-demo:~# kubectl exec -it client -- ip neighbor show
10.244.0.26 dev eth0 lladdr ea:b4:11:10:09:5b STALE

root@network-demo:~# kubectl exec -it client -- arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
10.244.0.26              ether   ea:b4:11:10:09:5b   C                     eth0

重新请求外部 IP,触发 ARP 广播后抓包。可以看出,回复时的 MAC 就是 lxc 的,Pod 以为在跟网关 cilium_host 通信,实际对端是 lxc。

root@network-demo:~# kubectl exec -it client -- tcpdump -pnei eth0 arp

09:22:20.518104 4a:de:3b:33:f8:4b > ea:b4:11:10:09:5b, ethertype ARP (0x0806), length 42: Request who-has 10.244.0.26 tell 10.244.0.49, length 28
09:22:20.518161 ea:b4:11:10:09:5b > 4a:de:3b:33:f8:4b, ethertype ARP (0x0806), length 42: Reply 10.244.0.26 is-at ea:b4:11:10:09:5b, length 28

3.查询 Node 路由表

结合 Client Pod 请求 ClusterIP 来看,在 lxc 处由 tc 将 10.96.94.95 更改为 10.244.1.124 后,通过内核路由表由宿主机 eth0 网口发给 172.18.0.2。

root@network-demo:~# docker exec -it cilium-kubeproxy-control-plane ip route show | grep '10.244.1.0/24'

10.244.1.0/24 via 172.18.0.2 dev eth0 proto kernel

4.查询 Node APR 表

通过 ARP 表确认对端 172.18.0.2 节点 MAC 地址

root@network-demo:~# docker exec -it cilium-kubeproxy-control-plane ip neighbor show | grep '172.18.0.2'

172.18.0.2 dev eth0 lladdr ae:d7:63:54:c6:7d REACHABLE