























Inside a Kubernetes cluster, every request starts with the same question I keep coming back to: where is the service I need to call, right now? Kubernetes answers it with a layered system that most people use without ever looking inside. A Service object gives you a name. CoreDNS turns that name into an address. A control loop keeps the list of real backends behind it accurate. And kube-proxy quietly rewrites that address into an actual pod. Understanding how those layers fit together is the difference between trusting the magic and debugging it at 3am.
This post is the Kubernetes-specific companion to my earlier writing on DNS as a discovery mechanism and where DNS load balancing stops being enough. If you have read DNS or a Service Registry?, this is what that choice looks like once you are inside a cluster: Kubernetes reuses DNS, but wraps it around the kind of live registry that post argued you sometimes need.
Three Kubernetes objects, plus one DNS server, carry the whole thing.
A Service is a stable identity. It has a name, a namespace, and (for the common type) a ClusterIP: a virtual IP that never changes for the life of the Service, even as the pods behind it are created and destroyed. The Service also has a label selector that defines which pods belong to it.
An EndpointSlice is the live answer. The Kubernetes control plane continuously watches which pods match a Service’s selector and are ready to serve, and writes their IPs and ports into EndpointSlice objects. This is the part that makes the system a real service registry rather than a static config file: when a pod dies or fails its readiness probe, it is removed from the slice in near real time.
CoreDNS is the cluster’s DNS server. It watches the Kubernetes API and answers DNS queries for Service names, returning the ClusterIP (or, for headless Services, the pod IPs directly).
kube-proxy is the data-plane glue. It also watches EndpointSlices and programs the node’s kernel so that traffic sent to a ClusterIP is transparently load-balanced to one of the real pod IPs.
The pattern is worth naming: a name resolves to a stable virtual IP, and a control loop keeps the real endpoints behind that IP true. DNS does not have to be fast or fresh here, because the freshness lives in the EndpointSlice and kube-proxy, not in the DNS record.
You create a Service, and Kubernetes assigns it a ClusterIP and a DNS name automatically.
apiVersion: v1
kind: Service
metadata:
name: payments
namespace: shop
spec:
selector:
app: payments # which pods belong to this Service
ports:
- port: 8080 # the Service port
targetPort: 8443 # the container port behind it
# Kubernetes now serves a stable name and VIP:
# payments.shop.svc.cluster.local -> 10.96.0.21The DNS name follows a fixed scheme: service.namespace.svc.cluster.local. From inside a pod, you rarely type the whole thing, because the pod’s resolver is configured with search domains that let short names work.
resolve from inside a pod
$ kubectl exec -it client -n shop -- sh
# short name resolves via the pod's search domains
/ # nslookup payments
Name: payments.shop.svc.cluster.local
Address: 10.96.0.21
# an SRV record carries port + target, like classic DNS discovery
/ # dig +short SRV _http._tcp.payments.shop.svc.cluster.local
0 100 8080 payments.shop.svc.cluster.local.That short-name resolution is driven by the pod’s /etc/resolv.conf, which the kubelet writes:
/etc/resolv.conf inside the pod
nameserver 10.96.0.10 # the CoreDNS ClusterIP
search shop.svc.cluster.local svc.cluster.local cluster.local
options ndots:5The ndots:5 option is a common source of confusion and latency. It means any name with fewer than five dots is first tried against each search domain before being treated as absolute. Looking up payments generates several queries (payments.shop.svc.cluster.local, then the shorter suffixes) until one resolves. It is convenient, but a chatty external hostname can multiply DNS traffic, which is why you sometimes see fully qualified names with a trailing dot to short-circuit the search.
CoreDNS is a plugin-chained DNS server, and in a cluster its most important plugin is kubernetes, which watches the API for Services and EndpointSlices and answers queries directly from that live view. It is configured by a Corefile.
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
forward . /etc/resolv.conf # non-cluster names go upstream
cache 30 # cache answers for 30s
}This is the key insight about cluster DNS: CoreDNS is DNS sitting on top of a live registry. The query interface is ordinary DNS, with all its universality, but the answers come from the Kubernetes control plane’s real-time view rather than a hand-maintained zone file. When a Service’s pods change, CoreDNS reflects it on the next query (subject to its small cache). You get DNS’s zero-integration client story without DNS’s usual staleness problem, because the source of truth is the EndpointSlice, not a TTL.
CoreDNS is also where the failure modes I described in It’s Always DNS show up in cluster form: a CoreDNS outage takes discovery down cluster-wide, ndots amplification can flood it, and its cache plugin trades a little freshness for a lot of load relief.
To understand what a DNS answer actually points at, you need the cluster’s IP model. Under the default pod network, the CNI plugin gives every pod its own routable IP, independent of the node it runs on. A Service’s ClusterIP is a separate virtual IP, and the EndpointSlice holds the individual pod IPs. Resolving a normal Service gives you the ClusterIP; the pod IPs stay hidden behind it.
Sometimes you want the pod IPs directly, with no virtual IP in front. That is a headless Service, declared with clusterIP: None.
headless service + lookup
apiVersion: v1
kind: Service
metadata: { name: cassandra, namespace: data }
spec:
clusterIP: None # headless: no VIP, no kube-proxy
selector: { app: cassandra }
ports:
- port: 9042
# DNS now returns one A record per ready pod; the client chooses
/ # dig +short cassandra.data.svc.cluster.local
10.1.7.4
10.1.7.9
10.1.8.2Headless Services are how stateful systems (databases, brokers) and client-side load balancers get the full membership list and address each pod individually, rather than being funneled through a single VIP. It is the cluster’s version of the DNS-returns-many-instances model, with the crucial improvement that the list is health-filtered by readiness.
Then there is host network. A pod with hostNetwork: true shares the node’s network namespace: its IP is the node’s IP, and its ports are the node’s ports. This is common for node-level agents (log shippers, CNI daemons, monitoring). The discovery gotcha is DNS. By default a host-network pod would inherit the node’s /etc/resolv.conf and never see CoreDNS, so cluster names fail to resolve. The fix is an explicit DNS policy.
apiVersion: v1
kind: Pod
metadata: { name: node-agent }
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet # keep using CoreDNS
containers:
- name: agent
image: node-agent:1.0
# Without ClusterFirstWithHostNet the pod uses the node's resolver
# and cannot resolve *.svc.cluster.local at all.Resolving a normal Service gives you a ClusterIP, but nothing actually listens on that IP. It is a fiction maintained by kube-proxy, which watches EndpointSlices and programs the node’s kernel (iptables rules, or IPVS for larger clusters) so that a connection to 10.96.0.21:8080 is rewritten to one of the ready pod IPs and balanced across them.
Two consequences matter for discovery. First, readiness gates membership: a pod only enters the EndpointSlice once its readiness probe passes, and is pulled the moment it fails. This is the cluster doing the health-aware routing I argued for in client-side vs server-side health checking, except the control plane and kube-proxy do it for you. Second, DNS staleness mostly does not matter here. The ClusterIP a client caches is stable for the Service’s whole life, so a long DNS TTL is harmless; all the churn happens behind the VIP in the EndpointSlice, which kube-proxy tracks continuously. This is precisely the property that plain DNS round-robin lacked in when DNS load balancing is not enough.
Here is the reframing that ties it back to the broader topic: Kubernetes already contains a service registry. The EndpointSlice API is a live, health-filtered, watchable registry of endpoints. CoreDNS is just one consumer of it. That is why the cluster does not need an external registry for in-cluster discovery the way a VM fleet often does.
But the registry model extends past the cluster edge in a few ways.
Reaching external services through cluster DNS. An ExternalName Service maps a cluster name to an external hostname, so in-cluster clients discover a non-Kubernetes dependency through the same DNS they already use.
apiVersion: v1
kind: Service
metadata: { name: legacy-billing, namespace: shop }
spec:
type: ExternalName
externalName: billing.corp.internal # CoreDNS returns a CNAME hereExternal registries like Consul. Tools such as Consul can sync in both directions: registering Kubernetes Services into the Consul catalog so VM-based clients can find them, and surfacing external Consul services inside the cluster as synthetic Services. This is the bridge when Kubernetes is one island in a larger, mixed estate that already standardized on a registry.
Service meshes via xDS. A mesh like Istio reads EndpointSlices directly and pushes them to each Envoy sidecar over the xDS protocol. At that point DNS is barely involved in routing: the sidecar holds a near-real-time, weighted, health-aware view of every endpoint and load-balances locally. This is the same control-plane-driven discovery model I covered in the proxy concurrency post, applied to east-west cluster traffic.
Multi-cluster. Once you have more than one cluster, the built-in DNS scope (cluster.local) stops being enough, and you reach for multi-cluster Services or a mesh that federates registries across clusters. That is exactly the boundary where in-cluster discovery hands off to the cross-datacenter coordination problem.
The honest summary is short. For discovery inside a single cluster, you almost never need anything beyond what ships in the box: Services, EndpointSlices, CoreDNS, and kube-proxy already give you health-aware, push-updated discovery behind a stable name. Use a headless Service when a client needs the raw membership list, and remember the dnsPolicy when you go host-network.
You reach past the built-ins for the cases that cross the cluster boundary: integrating a non-Kubernetes estate (an external registry like Consul), needing client-side load balancing with rich endpoint metadata (a mesh and xDS), or spanning multiple clusters (multi-cluster Services or a federated mesh). The decision framework for that DNS-versus-registry choice is the subject of its own companion post. In every one of those, the underlying pattern is the same one the cluster taught you: a stable name in front, a live registry of healthy endpoints behind, and a control loop keeping the two honest.
Kubernetes did not replace DNS for service discovery. It put DNS back where it belongs: a friendly name on the front of a live, health-aware registry that someone else keeps accurate for you.
This builds on my earlier pieces on DNS or a Service Registry?, It’s Always DNS, when DNS load balancing is not enough, and health checking in client vs server-side load balancing.
Running discovery across clusters or bridging Kubernetes with an external registry? I am on LinkedIn or reachable by email.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。