InertiaRSS Track and read blogs, news, and tech you care about
Read Original Open in InertiaRSS

Recommended Feeds

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

阮一峰的网络日志

科技爱好者周刊(第 396 期):互联网通信的替代方案 科技爱好者周刊(第 396 期):互联网通信的替代方案 - 阮一峰的网络日志 科技爱好者周刊(第 395 期):软件开发的第三种方式 科技爱好者周刊(第 395 期):软件开发的第三种方式 - 阮一峰的网络日志 科技爱好者周刊(第 393 期):脑腐状态 科技爱好者周刊(第 392 期):axios 投毒与好莱坞式骗术 科技爱好者周刊(第 391 期):AI 的贫富分化 科技爱好者周刊(第 390 期):没有语料,大模型就是智障 套壳中国大模型撑起500亿美元估值?扒一扒 Cursor 的"套壳"疑云 科技爱好者周刊(第 389 期):未来如何招聘程序员 科技爱好者周刊(第 388 期):测试是新的护城河 零安装的"云养虾":ArkClaw 使用指南 科技爱好者周刊(第 387 期):你是领先的 科技爱好者周刊(第 386 期):当外卖员接入 AI 字节全家桶 Seed 2.0 + TRAE 玩转 Skill 科技爱好者周刊(第 385 期):马斯克害怕中国车企吗? 智谱旗舰 GLM-5 实测:对比 Opus 4.6 和 GPT-5.3-Codex 科技爱好者周刊(第 384 期):为什么软件股下跌 科技爱好者周刊(第 383 期):你是第几级 AI 编程 Kimi 的一体化,Manus 的分层 科技爱好者周刊(第 382 期):独立软件的黄昏 AI native Workspace 也许是智能体的下一阶段 科技爱好者周刊(第 381 期):中国 AI 大模型领导者在想什么 科技爱好者周刊(第 380 期):为什么人们拥抱"不对称收益" 科技爱好者周刊(第 379 期):《硅谷钢铁侠》摘录 我如何用 AI 处理历史遗留代码:MiniMax M2.1 升级体验 科技爱好者周刊(第 378 期):预测是新的互联网热点 科技爱好者周刊(第 377 期):14万美元的贫困线 科技爱好者周刊(第 376 期):太空数据中心的争议 科技爱好者周刊(第 375 期):一扇门的 Bug 终于有人做了 Subagent,TRAE 国内版 SOLO 模式来了 科技爱好者周刊(第 374 期):6GHz 的问题 VS Code 使用国产大模型 MiniMax M2 教程 科技爱好者周刊(第 373 期):数据模型是新产品的核心 国产大模型接入 Claude Code 教程:以 Doubao-Seed-Code 为例 科技爱好者周刊(第 372 期):软件界面如何设计 大模型比拼:MiniMax M2 vs GLM 4.6 vs Claude Sonnet 4.5 科技爱好者周刊(第 371 期):一个乐观主义者的专访 科技爱好者周刊(第 370 期):正确的代码高亮 错误处理:异常好于状态码 科技爱好者周刊(第 369 期):Tim 与罗永浩的对谈 科技爱好者周刊(第 368 期):不要这样管理软件团队 一天之内,智谱和 Anthropic 都发了最强编程模型 科技爱好者周刊(第 367 期):Nano Banana 的几个妙用 科技爱好者周刊(第 366 期):旧金山疯狂的 AI 广告 科技爱好者周刊(第 365 期):流量变现正在崩塌 科技爱好者周刊(第 364 期):最难还原的魔方 科技爱好者周刊(第 363 期):最好懂的神经网络解释 科技爱好者周刊(第 362 期):GitHub 工程师谈系统设计 科技爱好者周刊(第 361 期):暗网 Tor 安全吗?
Plain Language Multi-Cluster: Tools and Application Assistants
阮一峰 · 2024-09-11 · via 阮一峰的网络日志

I. Introduction

Last week, I attended the Tencent Global Digital Ecosystem Conference..

Today, I’d like to share with you some of my takeaways, specifically my understanding of multi-cluster tools.

For those in software development, you’ve likely heard of Kubernetes. It’s a container management tool and quite complex in itself.

One can imagine that a tool for managing multiple Kubernetes clusters would be even more complex. However, I found that multi-cluster is actually quite easy to understand.

During the conference, there was a presentation about a new service from Tencent related to multi-cluster management called TKE AppFabric, which was explained in a simple way, and I grasped it immediately.

Below,I will try to explain what Kubernetes is, what multi-cluster tools are, and what the simplest way to use them is.

II. Starting with Docker

To understand Kubernetes, it's necessary to start with Docker.

In 2013, Docker was born, creatively packaging the runtime environment of software applications together with the source code into a container image (image).

A container image itself is a binary file that can be directly published. As long as other machines have Docker installed, they can run this file. It allows software to run in a virtual environment (called a "container"), ensuring that the runtime environment and development environment are consistent, thus avoiding troubles like environment configuration and startup errors.

More importantly,A container image is a standardized file; regardless of the programming language used to develop the software, when it is packaged into a container, it follows a format . Therefore, a single tool can be used to handle the release of all container projects, completely ignoring the differences in development languages.

It is precisely because Docker provides a standardized, one-stop software execution process that it paved the way for the later widespread adoption of "container application management tools."

Today, Docker has become the standard for software deployment. Whether the software is released as source code or as a container image, it is ultimately deployed and run within Docker.

III. Microservices Architecture

After Docker's emergence, software deployment was greatly simplified, reducing it to merely running container images. Naturally, developers began to consider whether a single, massive piece of software could be split into multiple components (i.e., multiple containers) for deployment.

Early enterprise-level large applications were typically a single monolithic software containing multiple components with different functions. Even modifying just one component required redeploying the entire software.

The current practice is to split larger functional components out, where each component is an independent service, published and deployed as a separate Docker container.

So then,A monolithic software becomes a software system composed of multiple Docker containers, which is the now-popular "microservices architecture" (microservices)The software consists of multiple microservices, with each microservice corresponding to a Docker container.

IV. Container Management Tool Kubernetes

Microservices mean that every release involves a large number of different containers, making their management a challenge. Container management tools were born to address this.

Among various container management tools, the most famous is notKubernetesUnrivaled.

It is an open-source software developed by Google, because the first letterKAnd the end of the wordsThere are 8 characters in between, so it is often written as K8s. It has become the de facto standard for container management.

Specifically, it mainly has the following functions.

(1) Unified hardware interfaceDevelopers don't need to worry about the underlying hardware details, as differences in the underlying servers are abstracted into a unified operation interface.

(2) Auto-scalingIt can quickly complete horizontal scaling based on software load conditions.

(3) High Availability。When a container fails, it will automatically restart or be replaced, ensuring traffic is routed to available nodes. If there are issues with software release, it can also automatically rollback.

(4) Other Features 。It also includes a wide range of related features such as service discovery, load balancing, and resource monitoring, along with a vast array of plugins and extensions, as well as an active community.

V. What is a Multi-Cluster?

Kubernetes at its core is a group of servers running many containers. Each instance of Kubernetes is referred to as a cluster .

For typical software applications, a single cluster is sufficient. However, due to the reasons mentioned below, enterprise applications often need to be deployed across multiple clusters.

Multi-cluster can be in the same data center or in different data centers. In practical applications, the latter is often the case, i.e., distributed across different data centers. At this point, if the clusters come from different cloud service providers or are of different cloud natures, it is referred to as "multicloud" (multicloud).

The main considerations for multi-cluster are as follows.

(1)Disaster RecoveryIf one cluster has an issue, there is another cluster that can ensure availability.

(2) IsolationClusters can achieve strong physical isolation between each other, thereby realizing isolation for upper-layer users (tenants).

(3) FlexibilityCloud computing helps reduce vendor lock-in, allowing you to choose the most suitable infrastructure and services based on your needs.

(4) Compliance。Different regions may have different regulatory requirements, and multi-cluster environments can implement more granular security policies and access controls for each cluster.

VI. Challenges of Multi-Cluster Environments

While multi-cluster environments offer the benefits mentioned in the previous section, the complexity also doubles, presenting many challenges for users.

(1) Configuration and Management Complexity 。All clusters require consistent configuration and deployment, striving to eliminate differences.

(2) Network Connectivity and Latency 。How to ensure secure and reliable connections between clusters in different geographic locations while minimizing latency.

(3) Service Discovery and Load Balancing 。How a service discovers other services across different clusters and how to achieve load balancing across clusters.

(4) Monitoring。Metrics and logs for all clusters should ideally be centralized for unified monitoring.

(5) Security and Access Control 。Security policies, access control, and credential management across multiple clusters become more complex, requiring careful rules and individual configuration.

VII. Multi-Cluster Tools and Their Challenges

To address these challenges, specialized multi-cluster management tools have emerged, such as Argo CD, Rancher Fleet, and Karmada.

They can be seen as an intermediary layer between developers and Kubernetes, resolving the complexity of cluster management.

The issue is that to use them, one must first learn Kubernetes before learning these tools themselves. This represents a significant learning cost, so multi-cluster tools are not designed for application developers but for cluster administrators

In reality, multi-cluster is a highly specialized field that developers from other areas simply cannot understand. After completing software development, developers hand over the application to cluster administrators to deploy.

This is troublesome for both sides. On one hand, developers cannot decide on deployment strategies and are unaware of underlying resources, often having to deal with container management in many cases. On the other hand, cluster administrators are forced to intervene at the application level, and once adjustments are made to underlying resources, they need to notify developers to ensure their participation in maintaining application operation.

8. TKE AppFabric for Application-Oriented Multi-Cluster Assistance

How can we make it easier for developers to use multi-cluster environments?

Tencent Cloud's solution isAdd an application-oriented middle layer to hide the multi-cluster tool layer, reducing the barrier to useThis service is named TKE AppFabric.

In its name, TKE stands for "Tencent Cloud Container Service" (Tencent Kubernetes Engine), and AppFabric refers to weaving application containers together like fabric.

It is aimed at application developers, with a positioning of "serving applications well upwards and managing clusters well downwards," which can be seen as a multi-cluster assistant for applications.

Due to encapsulating the layer of multi-cluster tools, it lacks complex technical terms, making it very easy to understand. Developers can quickly grasp and get started with it without worrying about underlying resources, or even needing to know the concept of "clusters."

Its simplicity is reflected in the following aspects.

First, it uses "availability zones," which are easier for developers to understand. When deploying applications, you only need to specify in which zones (such as Guangzhou Zone 1, Shanghai Zone 1), which is the deployment location, and that's enough.

The entire process is application-oriented and decoupled from Kubernetes. On one hand, this allows developers to focus more on business, and on the other, it enables cloud service providers to fully allocate resources and improve resource utilization. At the same time, cluster upgrades and maintenance are transparent to upper-level users.

Secondly, it simplifies setup by adopting declarative configuration, requiring only the writing of declarative files, which further reduces the learning curve.

Additionally, it encapsulates some Kubernetes-related functionalities for application operation, making it more user-friendly. Monitoring metrics and logs are also centralized, making it easier to identify issues.

IX. Multi-Cluster Case Study: Tencent Health

Tencent Health is built on TKE AppFabric. Through it, let's explore how to use multi-cluster setups for large-scale services.

The following image shows the backend architecture of Tencent Health.

In the figure above, the gateway is the access point, and three availability zones—zone1, zone2, and zone3—are deployed simultaneously. They are located in different data centers.

These three availability zones are identical, with one system instance deployed in each zone. Each system instance contains three tiered-dependent applications: app1 depends on app2, and app2 depends on app3. These three applications themselves are each a container group (app pods).

This architecture has three benefits, ensuring high availability and load balancing.

(1)Disaster Recovery DeploymentIf an Availability Zone fails, you can switch to another Availability Zone (for example, if app2 in zone1 fails, you can switch to app2 in zone2) to ensure availability.

(2) Route controlAutomatically assign users to the nearest available zone to improve access speed.

(3) Canary Release . New features can first be canary released in a single availability zone for validation, and then fully released across all availability zones to reduce release risk.

According to the on-site presentation, all Tencent internal cloud migration projects, such as QQ, Tencent Meeting, and audio/video services, will be deployed on TKE AppFabric. It will go into external trial operation in the fourth quarter of this year and open to the public in the first quarter of next year.

Ten, Summary

For enterprise applications using a "microservices architecture," if the business is important and requires high availability, multiple Kubernetes clusters are almost a necessity.

If the company has a dedicated team, you can choose to manage multiple clusters yourself. Otherwise, you can consider tools provided by cloud service providers.

I believe that more and more cloud service providers may offer two sets of tools in the future: one is the original multi-cluster tool, specifically designed for advanced users, and the other is an application-oriented assistant tool like TKE AppFabric that hides the details of multi-clustering, for general developers.

For those interested in multi-clustering or TKE AppFabric, you can scan the QR code below on WeChat to view the product manual.

(End)