惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

Luca Cavallin

AI Engineering for Developers | Blog AI Engineering for Developers Platform Engineering End-to-End | Blog Google Cloud Networking 101: The Comprehensive TLDR | Blog Google Cloud Networking 101: The Comprehensive TLDR Containers Are Not Automatically Secure | Blog Containers Are Not Automatically Secure Watery Stone Beacon | Photography Blue Iceman Suture | Photography Hidden Emerald Pool | Photography Autumn Chapel Pinnacles | Photography A Tour of eBPF in the Linux Kernel: Observability, Security and Networking | Blog A Tour of eBPF in the Linux Kernel: Observability, Security and Networking Shared Violet Pulse | Photography Kubernetes Networking from Packets to Pods | Blog An Overview of Network Protocols | Blog An Overview of Network Protocols A Quick Journey Into the Linux Kernel | Blog A Quick Journey Into the Linux Kernel OpenTelemetry: A Guide to Observability with Go | Blog I'm on the Cillers Podcast Talking About Tech and Hackathons | Blog Yet Another List of Random Opinions on Writing Readable Code and Other Rants | Blog My post about Istio is now on the Istio blog too! | Blog Tropical Jungle Escape | Photography The Istio Service Mesh for People Who Have Stuff to Do | Blog Dreamy Cartoonscape Windmill | Photography Twilight Windmill Reflections | Photography Notes I took while reading "Applied Machine Learning and AI for Engineers" and "Introducing MLOps" | Blog Things I've Learned About Terraform That I Keep Telling People About | Blog Analyzing Unsplash Photo Performance with Python | Blog Analyzing Unsplash Photo Performance with Python | Blog I am a Top Mentor on MentorCruise! 🎉 | Blog CI/CD Observability on GitHub Actions and the Role of OpenTelemetry | Blog CI/CD Observability on GitHub Actions and the Role of OpenTelemetry | Blog Silent Water Sentinel | Photography Three Early Crosses | Photography Fiery Twilight Trails | Photography Forested Folds Flowing | Photography Majestic Snowbound Spire | Photography Shrouded Winter Peaks | Photography Space Cat Pillar | Photography I am a CNCF (Cloud Native Computing Foundation) Ambassador! | Blog Curved Valley Mist | Photography Highly Independent Tree | Photography Misty Morning Plateau | Photography Sick Shadows Fading | Photography Half Moon Blossom | Photography Serene Pedestal Swinging | Photography Sunset Clouds Reeling | Photography Aerial Nose Parking | Photography How to Structure C Projects: These Best Practices Worked for Me | Blog How to Structure C Projects: These Best Practices Worked for Me | Blog I'm on the KubeFM Podcast Talking About "Linux Containers From Scratch" | Blog I am (again) a Google Developers Expert! | Blog How to Configure OIDC with Terraform for GitHub Enterprise Server | Blog How to Configure OIDC with Terraform for GitHub Enterprise Server | Blog Modern Frontend Development: A Tooling Overview for Engineers Revisiting the Field | Blog Meet verto.sh: Your Gateway to Open-Source Collaboration. | Blog Crafting a Clean, Maintainable, and Understandable Makefile for a C Project. | Blog Crafting a Clean, Maintainable, and Understandable Makefile for a C Project. | Blog barco: Linux Containers From Scratch in C. | Blog barco: Linux Containers From Scratch in C. | Blog How to Create a Release With Multiple Artifacts From a GitHub Actions Workflow Using the Matrix Strategy | Blog How to Create a Release With Multiple Artifacts From a GitHub Actions Workflow Using the Matrix Strategy | Blog How Databases Store and Retrieve Data with B-Trees | Blog How Databases Store and Retrieve Data with B-Trees | Blog Concurrency in Go: Goroutines, Channels, Mutexes, and More | Blog Concurrency in Go: Goroutines, Channels, Mutexes, and More | Blog Club Cloud 2021: Cloud Engineering Panel Discussion | Blog Club Cloud 2021: Cloud Engineering Panel Discussion | Blog How to Prepare for the Google Cloud Engineer Associate Certification Exam | Blog How to Prepare for the Google Cloud Engineer Associate Certification Exam | Blog What is Google Cloud Deploy? | Blog What is GitOps? | Blog Club Cloud Stories #2 - News from Around the Cloud | Blog Club Cloud Stories #2 - News from Around the Cloud | Blog Club Cloud Stories #1 - The First Episode with Antoni Tzavelas & Mark van Holsteijn | Blog Club Cloud Stories #1 - The First Episode with Antoni Tzavelas & Mark van Holsteijn | Blog Quiet Oak Shining | Photography How to Read Firestore Events with Cloud Functions and Golang | Blog How to Read Firestore Events with Cloud Functions and Golang | Blog Google Cloud Pub/Sub vs NATS: An Easy-to-Understand Comparison | Blog Google Cloud Pub/Sub vs NATS: An Easy-to-Understand Comparison | Blog How to Deploy a Multi-cluster Service Mesh on GKE with Anthos | Blog How to Deploy a Multi-cluster Service Mesh on GKE with Anthos | Blog How to Safely Store Secrets in Terraform Using Cloud KMS | Blog How to Safely Store Secrets in Terraform Using Cloud KMS | Blog Designing Serverless Applications on AWS - Jacco Kulman and Luca Cavallin @ End2End LIVE | Blog Designing Serverless Applications on AWS - Jacco Kulman and Luca Cavallin @ End2End LIVE | Blog How to Use Terraform Workspaces to Manage Environment-based Configuration | Blog Puffy Steel Spreading | Photography How to Deploy ElasticSearch on GKE using Terraform and Helm | Blog How to Deploy ElasticSearch on GKE using Terraform and Helm | Blog Summer Windmills Spinning | Photography How to Optimize PHP Performance on Google Cloud Run | Blog How to Optimize PHP Performance on Google Cloud Run | Blog Foggy Boats Rusting | Photography How I Prepared for the Google Cloud Associate Cloud Engineer Exam | Blog How I Prepared for the Google Cloud Associate Cloud Engineer Exam | Blog Winter Kids Chasing | Photography
Kubernetes Networking from Packets to Pods
Luca Cavallin · 2025-07-01 · via Luca Cavallin

Kubernetes networking often feels like a complex contraption. To understand how it works, we first need to look at its most basic components, the core principles of TCP/IP and the Linux networking stack. In the early days of computing, networks were largely proprietary, meaning hardware and software from one vendor couldn't communicate with another. This "wild west" of networking led to the development of standardized frameworks to ensure interoperability. The most famous of these is the OSI (Open Systems Interconnection) model, a seven-layer conceptual model that standardizes the functions of a network system. While a great theoretical tool, the model that won out in practice is the more streamlined TCP/IP model.

The TCP/IP model, which powers the modern internet, is composed of four primary layers:

  • Link Layer: This is the lowest layer, responsible for the physical transmission of data over the network medium, like Ethernet or Wi-Fi. It deals with MAC addresses and the physical hardware of network interface cards.
  • Internet Layer: This layer is responsible for logical addressing and routing. The Internet Protocol (IP) operates here, assigning unique IP addresses to hosts and figuring out the best path to send packets across networks.
  • Transport Layer: This layer ensures data is delivered between applications. The two most common protocols here are TCP (Transmission Control Protocol), which provides reliable, connection-oriented delivery (guaranteeing packets arrive in order and without errors), and UDP (User Datagram Protocol), which offers a faster, connectionless service without such guarantees.
  • Application Layer: At the top of the stack, this is where user-facing applications like web browsers (HTTP), email clients (SMTP), and DNS operate. These applications create and consume the data that is then passed down the stack to be sent across the network.

Understanding this layered approach is fundamental, as every network packet in a Kubernetes cluster adheres to this model. We'll explore this entire ecosystem in three parts: the foundational technologies that make it all possible, the core Kubernetes model itself, and finally, advanced topics and practical guides.

Part I: Understanding the Foundations

Linux Networking

Before a single container is launched, its entire networking "reality" is defined within the Linux kernel. Understanding how Linux handles packets, interfaces, and rules is key to diagnosing issues at any level of the stack. These fundamentals are the building blocks for both container runtimes and Kubernetes.

The most basic networking construct in Linux is the network interface. This is a software representation of a point of connection to a network, which can be a physical device like an Ethernet card (eth0) or a purely virtual one, like the loopback interface (lo). A special and critically important virtual interface is the bridge interface. A Linux bridge functions as a virtual Layer 2 switch, capable of connecting multiple network interfaces together. When a packet from a connected interface arrives at the bridge, the bridge inspects the packet's destination MAC address and forwards it to the correct interface on the same host. This is the fundamental mechanism that allows containers on the same host to communicate with each other.

When a packet arrives at an interface, it is passed to the kernel for a journey governed by the Netfilter framework. Netfilter provides a series of "hooks" in the kernel's network processing path where other programs can register to inspect and manipulate the packet. The most well-known tool for managing these hooks is iptables, the classic userspace firewall utility. Using iptables, you can create rules that are checked against each packet, deciding whether to ACCEPT, DROP, or modify it (for example, using Network Address Translation - NAT). Working alongside Netfilter is conntrack, a system that tracks all network connections. This allows the kernel to recognize packets that are part of an existing, established connection, which is the basis for stateful firewalls.

While the kernel has a core routing table, several technologies have evolved to handle more complex traffic flows. After iptables, the next step was IPVS (IP Virtual Server). Built for high-performance load balancing, IPVS uses more efficient in-kernel hash tables instead of the sequential rule lists of iptables, making it a superior choice for environments with a large number of services.

The latest evolution, eBPF (extended Berkeley Packet Filter), fundamentally changes this dynamic by making the Linux kernel itself programmable. While traditional tools like iptables are powerful, they have inherent limitations in large-scale, dynamic environments. Iptables relies on long, sequential chains of rules; as the number of services and policies grows, traversing these chains for every packet can introduce significant CPU overhead and increase latency. eBPF avoids this by allowing small, highly efficient, and sandboxed programs to be attached directly to specific hooks within the kernel - for instance, at the exact moment a network driver receives a packet. The eBPF architecture ensures safety through a strict verifier that analyzes any program before it's loaded, and a Just-In-Time (JIT) compiler converts the eBPF bytecode into native machine code for maximum execution speed. This programmability extends beyond networking; by attaching to tracepoints and system calls, eBPF can power advanced security and observability tools, making it a foundational technology for the next generation of cloud-native infrastructure.

To navigate and troubleshoot this complex environment, Linux provides a suite of indispensable command-line tools.

  • ping and traceroute are for checking basic host reachability and mapping the path packets take.
  • dig is used to query DNS servers.
  • telnet and netcat (nc) are used to check if a specific port is open and listening.
  • nmap is a powerful network scanner for discovering hosts and services.
  • netstat and the more modern ss display local network connections and routing tables.
  • curl is the swiss-army knife for making HTTP/S requests.
  • openssl can be used to manually perform a TLS handshake to debug complex SSL certificate issues.

Container Networking

Before we can understand how containers talk to each other, we need a solid grasp of what a container is and the kernel magic that makes its isolation possible. Unlike a hypervisor, which creates a full-blown virtual machine (VM) with its own guest operating system, a container is a much lighter-weight construct. It's essentially just a sandboxed process, or group of processes, running directly on the host's Linux kernel. This approach avoids the overhead of booting a separate OS, making containers incredibly fast to create and resource-efficient.

This powerful isolation is achieved primarily through two Linux kernel features that act as digital walls: control groups (cgroups) and namespaces. Cgroups are the resource accountants; they control how much CPU, memory, and I/O a container is allowed to consume. Namespaces are the architects of isolation; they partition kernel resources so that a container has its own private view of the system. Most importantly for our topic, the network namespace provides a container with a completely fresh network stack: its own private set of network interfaces, IP addresses, routing tables, and firewall rules.

With this foundation, we can look at practical implementations like the Docker networking model. When you install Docker, it typically creates a virtual bridge on the host called docker0. When you launch a container, Docker creates a pair of virtual Ethernet interfaces (veth pair). One end is placed inside the container's new network namespace (as eth0), while the other end is attached to the docker0 bridge. This allows containers on the same host to communicate. For container-to-container communication on separate hosts, overlay networking is used. An overlay network encapsulates the container's traffic in a packet that the host network knows how to route (using a protocol like VXLAN), making it seem like all containers are on the same flat network.

To prevent every container runtime from having to reinvent this wheel, the community developed the Container Network Interface (CNI) specification. CNI is a simple standard that decouples the container runtime (like containerd or CRI-O) from the networking implementation. The runtime is only responsible for creating the network namespace and then calling a CNI plugin to do the actual work of setting up the network, like creating interfaces and assigning IP addresses. This pluggable architecture is a cornerstone of Kubernetes networking.

Part II: The Core Kubernetes Model

Kubernetes Networking

Kubernetes takes container networking to the next level by establishing a prescriptive, yet flexible, networking model. This model is built on a few fundamental principles: every Pod (a group of one or more containers) gets its own unique IP address across the entire cluster, and all Pods can communicate directly with all other Pods without needing Network Address Translation (NAT). This creates a clean, flat network space that behaves much like a traditional LAN.

To achieve this, the cluster's IP address space is partitioned. The kube-controller-manager is responsible for assigning a unique IP address range, called a podCIDR block, to each Node in the cluster. On each Node, the kubelet acts as the local Kubernetes agent. When a new Pod is scheduled, the kubelet calls the configured CNI plugin to wire the Pod into the cluster network. The power of this model lies in its pluggability. You can choose from dozens of popular CNI plugins: Flannel is a simple choice that creates an overlay network; Calico uses the BGP routing protocol for high-performance, non-overlay networking; Cilium leverages eBPF for highly efficient networking, observability, and security.

A key component for service discovery is the kube-proxy, a daemon that runs on every node. Its job is to implement the Kubernetes Service abstraction. When you create a Service, it gets a stable virtual IP address (ClusterIP). Kube-proxy's job is to make sure that any traffic sent to this ClusterIP is intercepted and load-balanced to one of the healthy backend Pods. It operates in several modes, with the default being iptables mode. For larger-scale deployments, ipvs mode is often preferred as it uses more efficient hash tables for load balancing.

While the model provides open communication by default, NetworkPolicy allows you to define firewall rules for Pods at the IP address or port level. These policies are enforced by the CNI plugin, allowing you to create fine-grained ingress and egress rules. Finally, no modern network is complete without DNS. Kubernetes provides a robust, cluster-aware DNS service (typically CoreDNS) that allows Pods to discover each other using predictable names instead of ephemeral IP addresses. Kubernetes also fully supports IPv4/IPv6 dual-stack networking, allowing Pods and Services to be allocated both address types seamlessly.

Kubernetes Networking Abstractions: Services, Ingress, and Meshes

While Pods have unique IPs, those IPs are ephemeral. To build reliable applications, Kubernetes provides several powerful networking abstractions that sit on top of this underlying Pod network.

The primary abstraction is the Service, which provides a single, stable endpoint for a group of Pods. Kubernetes tracks the IPs of the Pods backing a Service using EndpointSlices (a scalable evolution of the original Endpoints object). There are several types of Services:

  • ClusterIP: The default type, exposing the Service on an internal, cluster-only IP address. This is the standard for internal service-to-service communication.
  • NodePort: Exposes the Service on a static port on each Node's IP address, making it accessible from outside the cluster for development or demos.
  • LoadBalancer: The standard way to expose a service to the internet. It provisions a cloud load balancer that directs external traffic to the Service's NodePort.
  • Headless: By setting clusterIP: None, no virtual IP is created. DNS queries for the service return the IPs of all the backing Pods, which is useful for stateful applications where you want to connect to a specific instance.
  • ExternalName: Maps a service to an external DNS name by creating a CNAME record within the cluster's DNS.

For stateful applications like databases, the StatefulSet workload resource provides Pods with stable, unique network identifiers (e.g., db-0.my-db-service) that persist even if the pod is rescheduled.

Services operate at Layer 4 (TCP/UDP). For managing external access at Layer 7 (HTTP/HTTPS), Kubernetes provides Ingress. An Ingress resource lets you define rules for routing external HTTP traffic to internal Services based on hostname or URL path. An Ingress controller is the engine that makes it work—a proxy running in the cluster that watches for Ingress resources and configures itself to implement the defined rules.

Finally, for the most demanding microservices architectures, a service mesh like Istio or Linkerd offers an even higher level of abstraction. A service mesh works by injecting a lightweight "sidecar" proxy alongside every application container. These proxies form a mesh that provides advanced features like mTLS for security, sophisticated traffic management (canary releases, A/B testing), and deep observability, all without changing application code.

Part III: Advanced Topics and Practical Applications

Advanced Network Security in Kubernetes

A robust security posture in Kubernetes extends far beyond a single NetworkPolicy. It requires a defense-in-depth strategy that secures the entire system.

  • Securing the Control Plane: This is paramount. The Kubernetes API server is the brain of the cluster, and unauthorized access is catastrophic. Best practices include disabling anonymous access, using strong authentication, and enforcing the principle of least privilege with fine-grained RBAC (Role-Based Access Control) rules.
  • Workload-Level Security: Security must extend to the pods themselves. Pod Security Admission (PSA) allows you to enforce security standards at the namespace level. Using a securityContext within a pod's specification, you can prevent dangerous operations like running as root or disabling privilege escalation, which drastically reduces the blast radius if a container is compromised.
  • Zero-Trust Communication with mTLS: In a zero-trust network, trust is never assumed. A service mesh can enforce cluster-wide mutual TLS (mTLS), automatically encrypting all service-to-service traffic and verifying workload identities via certificates. This happens automatically, securing all service-to-service communication without requiring any changes to the application code.
  • Runtime Security and Threat Detection: The final layer is detecting threats in real time. Runtime security tools like Falco use eBPF to monitor kernel system calls. They can detect and alert on anomalous network behavior - like a pod unexpectedly making an outbound connection to an unknown IP - which could indicate a security breach.

The Gateway API: the Evolution of Ingress

As Kubernetes adoption grew, the limitations of the original Ingress API became clear. It is underspecified, leading to inconsistent implementations, and lacks the expressiveness needed for complex traffic routing. To address this, the Kubernetes community developed the Gateway API, a modern, standardized, and highly extensible successor that provides greater flexibility, security, and separation of concerns.

The power of the Gateway API lies in its role-oriented design, which decouples responsibilities:

  • Infrastructure Provider: Defines GatewayClass resources, which are templates for different types of load balancers (e.g., an AWS ALB class).
  • Cluster Operator: Creates Gateway resources, which are specific instantiations of a GatewayClass, requesting a concrete load balancing endpoint.
  • Application Developer: Manages Route resources (like HTTPRoute), defining the routing logic from a Gateway to their services.

This separation is a significant advantage. An application developer can safely manage routing rules for their own service without being able to modify the shared gateway itself. The Gateway API also introduces powerful features like safe cross-namespace routing and standardizes advanced traffic management patterns like traffic splitting and header-based routing, providing a robust foundation for modern Kubernetes networking.

Multi-Cluster Networking and Federation

As organizations scale, they often adopt a multi-cluster architecture for high availability, geographic distribution, or workload isolation. This introduces the challenge of enabling services across these cluster boundaries to communicate securely and reliably.

  • Service Mesh Federation: This is a popular approach where service meshes like Istio or Linkerd are configured to span multiple clusters. By establishing a shared root of trust, they create a unified service mesh where a pod in Cluster A can discover and securely connect to a service in Cluster B as if it were local.
  • Network-Level Connectivity: Tools like Submariner operate at a lower level (L3/L4) to create a flat network across clusters. They establish encrypted tunnels between gateway nodes in each cluster, effectively stitching the Pod and Service networks together so any pod can reach any other pod directly by its IP.
  • The Gateway API: The Gateway API was also designed with multi-cluster use cases in mind. Its hierarchical model allows platform administrators to provision Gateway resources that can be implemented by controllers capable of routing traffic across cluster boundaries, providing a standardized foundation for future multi-cluster ingress solutions.

Practical Troubleshooting Scenarios

Even in a well-configured cluster, network problems are a fact of life. A container might not start, or a service might become unreachable. When this happens, a systematic, layered approach to debugging is the fastest way to find the root cause. Below are step-by-step guides for two of the most common failure scenarios you're likely to encounter.

Symptom: Pod-to-Pod Connectivity Fails

This is one of the most fundamental issues: a pod is running, but it cannot communicate with another pod over the network. The failure could be in the CNI, a NetworkPolicy, or the application itself. Here's how to trace the problem:

  • Check Pod Status & Location: Run kubectl get pods -o wide. Are both pods Running? Are they on the same node or different nodes?
  • Examine Events and Logs: Use kubectl describe pod <pod-name> to look for recent events like FailedCreatePodSandBox. Check application logs with kubectl logs <pod-name> to rule out application-level errors.
  • Verify NetworkPolicy: Run kubectl get networkpolicy -n <namespace>. If any policies are present, inspect them to ensure they aren't unintentionally dropping the traffic.
  • Isolate with a Debug Container: Launch a temporary pod with networking tools: kubectl run -it --rm --image=nicolaka/netshoot network-debug -- /bin/bash. From inside this debug pod, use ping or curl to try and reach the destination pod's IP address. If this works, the network layer is likely fine.

Symptom: Service Discovery Fails

A common and often frustrating issue is when pods can connect to each other by IP address, but service discovery fails when they use a service name. This almost always points to a problem with the cluster's DNS service, typically CoreDNS.

  • Check CoreDNS Health: Run kubectl get pods -n kube-system -l k8s-app=kube-dns to see if the CoreDNS pods are running. Check their logs for errors.
  • Inspect the Pod's resolv.conf: Exec into a problematic pod (kubectl exec <pod-name> -- cat /etc/resolv.conf) to ensure the nameserver points to the kube-dns service IP.
  • Test Resolution Directly: From a debug container, use nslookup <service-name> to test internal resolution, then nslookup google.com to test external resolution. This will pinpoint the source of the failure.

Conclusion

So, if you've made it this far, you've taken the full journey - from a single packet hitting a network card all the way up to a service mesh managing traffic across a global fleet. The key takeaway is that Kubernetes networking isn't some unknowable magic; it's a powerful stack of abstractions built on top of familiar, battle-tested tools. It all starts with the rock-solid foundation of the Linux kernel's networking capabilities. Containers then use primitives like namespaces to get their own isolated slice of that stack. Kubernetes simply orchestrates this concept at a massive scale, giving every pod an IP address via CNI and providing stable endpoints with Services. When you understand how these layers connect - how a request flows through a Service, is handled by kube-proxy, and finally reaches a pod on a CNI-managed network - you're no longer just using a black box. You're equipped to diagnose, troubleshoot, and build more resilient systems. Hopefully, this deep dive helps you do just that!