惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
Scaling hundreds of thousands of database clusters on Kubernetes — PlanetScale
Brian Morrison II · 2023-09-27 · via Blog — PlanetScale

Brian Morrison II |

Containers have made an incredible impact when it comes to making, deploying, and distributing applications.

When building containerized applications, you no longer have to be concerned about dependency mismatches or the age-old “works on my machine” argument. And Kubernetes has simplified deploying and scaling these containerized applications. If a container crashes, Kubernetes can easily spin a new one up to handle the load!

But have you ever wondered if you can run a database in Kubernetes?

The short answer is “yes.” In fact, we utilize Kubernetes extensively at PlanetScale to support hundreds of thousands of databases all over the world. When deploying a database workload to Kubernetes, special considerations need to be made regardless of how big the workload is.

Let's explore how you might want to deploy databases on Kubernetes, and how PlanetScale does it.

A crash course on Kubernetes

Kubernetes is a container orchestration tool used by some of the largest enterprises around the world to manage their fleet of containerized applications.

In a Kubernetes cluster, multiple servers (nodes in Kubernetes parlance) are configured to work together to ensure that the containers deployed to them are always online and available. The smallest deployable unit in Kubernetes is known as a pod, which represents one container or a collection of containers. When a pod crashes for whatever reason, the environment is smart enough to spin up a new instance of that pod to keep the application online, whether that's on the same node or a different one.

This process is known as the Control Loop.

The Control Loop is managed by the Kubernetes Scheduler. It utilizes configuration files (written in YAML) that define what pods need to be running for a given application to stay online. The scheduler does this by comparing what's defined in the config file vs. what’s actually deployed on Kubernetes and taking the necessary steps to reconcile the differences.

This type of setup works great for applications, but what happens when your database is running on a pod that crashes?

A circular diagram of the Kubernetes “Control Loop” showing 3 steps: Observe → Diff → Act

The issue with running databases on Kubernetes

The main concern with databases on Kubernetes is that they are stateful.

The statefulness of a specific workload references whether the data related to that workload is ephemeral or not. Applications deployed to Kubernetes should be built with this in mind. If a container within a pod crashes, any data being stored within the container is essentially lost. With databases, the state of the data within the database is kind of important, which is why special considerations need to be taken when deploying the database to Kubernetes.

According to the official docs, StatefulSets are the recommended way to run a database on Kubernetes.

StatefulSets in Kubernetes allow you to define a set of pods that maintain the state of the data within a pod regardless of its online status. Kubernetes does this by attaching persistent storage to the pod for it to read and write data to, as well as ensuring that when pods come online, they do so in the same order, with the same name and network address every time. This is in comparison to something like a deployment, which aims to keep a specific number of a given pod online but doesn’t care in what order they come up with as long as they are accessible.

Whenever you want to deploy a database to Kubernetes, StatefulSets should be used, but other considerations are needed to run a database on Kubernetes properly.

Considerations when deploying MySQL to Kubernetes

Deploying a well-architected MySQL setup to Kubernetes is more complicated than just deploying a bunch of pods running MySQL.

Replication and node failures

MySQL can be launched with replication enabled, which permits data to be synchronized across the pods in a given StatefulSet, but what happens when a node experiences an outage?

Source of truth

Within the replicated environment, how do you know which of the pods has the most up-to-date data, and how should your application determine which pod to query from?

Backups and restores

Backups are always necessary to ensure that your organization can roll back accidental writes, but in an environment with multiple MySQL instances running, where do the backups come from, how do you know they are complete, and how should you restore them?

How PlanetScale manages hundreds of thousands of MySQL database clusters

Until this point, much of what has been discussed in this article is considered best practices when running databases on Kubernetes, but we do things a bit differently here at PlanetScale.

Leveraging Vitess on Kubernetes

It’s no secret that PlanetScale uses Vitess to manage and operate our databases, but there’s more to the story than that.

Vitess is an open-source, MySQL-compatible project that is designed to scale beyond the traditional capabilities of MySQL. It does this by providing additional components on top of MySQL such as a stateless proxy (known as VTGate) and a topology server. In a Kubernetes environment, these two components are used together to determine how many MySQL instances exist, how to access them, and (in a horizontally sharded configuration) on which pod the requested data lives. Those MySQL pods in Vitess are known as a tablet, which is a pod running MySQL along with a sidecar process known as vttablet. This allows the management plane (vtctld) to manage them, as well as notify the overall configuration of any topology changes. All of this functionality is bundled into Vitess, but it also adds the question of how Vitess is managed.

This is where PlanetScale Vitess Operator for Kubernetes is used.

Kubernetes offers a great deal of automation, but some workloads require a bit more logic than out-of-the-box Kubernetes is prepared to handle. Operators allow developers to extend Kubernetes by adding custom resources that add to the Control Loop. The Vitess Operator addresses the issues outlined in the previous section by enabling Kubernetes to streamline these otherwise complicated management tasks.

The combination of Kubernetes, Vitess, the PlanetScale Operator, and our global infrastructure is what has enabled us to scale and manage hundreds of thousands of MySQL database clusters.

Deploying new databases

When a user needs to deploy a new database into our infrastructure, the first step is that our API sends a request to our custom orchestration layer asking for that database to be created.

Provided the request is valid, the orchestrator will define a custom resource definition we use to define the specifications of a PlanetScale database. The Operator discussed in the previous section will use the built-in mechanism of Kubernetes (specifically, the Control Loop and the API) to detect that the current state and desired state do not match. The Operator will create the necessary resources to run and operate a Vitess cluster for the requested database.

Once this process is completed, our orchestrator will notify the rest of the system that the creation process is done and the user can start using the database.

Storage management

Here is where things start deviating from what's considered common knowledge when running databases on Kubernetes.

As discussed earlier, the recommended best practice is to use StatefulSets to run databases since the state is automatically tracked by Kubernetes. We actually don’t do this and opt instead to use the logic built into the Vitess Operator to spin up pods that attach directly to cloud storage using a persistent volume claim (PVC). Because we already have a routing mechanism in place (VTGate), we don’t need to be concerned about the name or address of a given pod.

Attaching directly to cloud storage also allows us to programmatically manage how much storage is allocated.

As database utilization is increased, the required storage to operate a database often increases with it. To address this, we have monitoring mechanisms in place to detect when provisioned cloud storage that serves a database starts nearing capacity. When this occurs, our internal systems will use the cloud providers’ APIs to automatically allocate additional space so the databases that are being served by that storage do not stop from capacity issues.

This process occurs entirely behind the scenes so our users never have to worry about running out of space for their database.

Database backups

Regardless of where you run your database, backups should be first-class citizens as they can literally save your business.

One consideration around backing up your database, especially with more extensive databases, is the performance impact that running a backup can have. To avoid this issue, we actually utilize Vitess to create a special type of tablet that's ONLY used for backing data up. To back up a database on PlanetScale, our system will restore the latest version of the backup to this tablet, replicate all of the changes that have occurred since the backup was taken, and then create a brand new backup based on that data.

This not only significantly decreases the impact on the production database but also has the added benefit of automatically validating your existing backups on PlanetScale!

Mitigating failures

Most databases in PlanetScale operate on either AWS or GCP and as robust as those platforms are, we also have to consider how to address failures beyond our control.

Using Kubernetes and the Vitess Operator allows us to automatically handle instances when a pod or container within that pod gets stopped. The configuration for each database in Kubernetes defines how many resources and of what type should be running at any time. If a deviation is detected, the Operator will automatically take the necessary steps to spin up new resources to ensure things are always running smoothly.

Another type of outage we protect against for our paid production databases is when availability zones go offline.

Using Base plan databases as an example, we automatically provision a Vitess cluster across three availability zones in a given cloud region. This includes the tablets that serve up data for your database, as well as the VTGate instances that route queries to the proper tablets. This means that if an AZ gets knocked offline, the Vitess Operator will automatically detect the outage and apply the necessary infrastructure changes to keep our databases online.

In fact, we don't even support cloud regions with less than three availability zones!