惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
Deploying multiple schema changes at once — PlanetScale
Shlomi Noach · 2023-08-29 · via Blog — PlanetScale

Shlomi Noach |

PlanetScale's database branching uses a declarative schema approach, but we take it even further and treat all the changes in your branch as a single deployment. As much as possible, PlanetScale deploys your entire set of changes near-atomically, which means the production database schema remains stable throughout the deployment process and changes almost all at once when all changes are ready. In this post, we will discuss the benefits of atomic multi-change deployments and work through the technical challenges of making them possible.

Why we choose to use "near-atomically"

Atomicity ensures an all-or-nothing change. A data transaction is an obvious example: you can change data in two tables, say insert to one and update the other, and enforce that either both changes happen or none do. You run both in a transaction and finalize the change with a COMMIT. The database keeps a transaction counter along with change journaling. If there's a crash halfway through the transaction, its recovery process can reliably identify the transaction's incomplete and undo the partial changes. But what's trivial for data changes is not so trivial for schema changes.

PlanetScale uses MySQL under the hood through Vitess. With MySQL, it is not possible to transactionally and atomically make changes to multiple table schema definitions. If you want to CREATE one table, ALTER another, and DROP a third, you must run these changes in some order. For this reason, we use the term "near-atomically." We also use this with PlanetScale's gated deployments, and we often use the term "gated" to indicate that all changes complete together.

Why atomic multi-change deployments?

We like to think of schema changes as deployments, similar to code deployments. Existing relational database systems have trained us to think that schema changes are necessarily dangerous, disruptive, irrevertible, and sequential.

Consider that a change to our code requires changing three different large tables:

  • Adding and modifying a column on one.
  • Adding a column and an index on another.
  • Adding a new check constraint on the third.

The database system considers these three unrelated changes, but we know they are semantically related.

Schema changes on large tables may take a long time, sometimes hours or more, to complete. Let's assume each of our changes runs for 8 hours. Historically, we are accustomed to accepting that we must run one ALTER TABLE after the other. Once we start our deployment, we expect 24 hours until it's complete. But during that time, the database is in a semantically inconsistent state. The deployment is partially done and partially queued up.

In an ideal world, we can wait out these 24 hours and call it a day (no pun intended). But in reality, we might find that our design was flawed, or perhaps there's an incident that takes priority, and we want to cancel the deployment. Has it been 10 hours? One of the changes will have been applied, the others are still pending. With traditional databases, you can't just cancel that completed schema change.

We're also used to the notion that a schema change is dangerous. Have we dropped the wrong column? Did you make a bad assumption about the data type? Or did you miss a constraint? The wrong schema change is notoriously known to have been the source of many production system outages. And at 8 hours each, it looks like our three-table deployment will be risky around all time zones.

Gated deployments

Gated deployments offer a change of concept, where all your changes are staged for however long it takes for all of them to be ready. In our example above, we can assume at around 24 hours for all changes to become ready. At that point, we complete the deployment, applying all the changes in production all at once. And because it is impossible to ensure atomicity, the changes are applied a few seconds apart.

With this approach, there is only one "major event" to this deployment. Since gated deployments allow you to pick your preferred time to complete the changes, you can control the time of the "event." And, if there is a change of heart during the staging period or an incident that takes over priorities, the deployment may be canceled without impacting production. The friction point, where a schema may only be partially deployed, is reduced from days or hours to seconds.

A technical overview, and when things get complicated

Some schema change operations are immediate. For example, when creating a new table or modifying a view definition. Those changes have no data directly associated and are very fast to perform. Some ALTER TABLE changes are also eligible for fast execution. And yet, many are not. If we wanted to deploy a CREATE TABLE, ALTER VIEW, and ALTER TABLE, all changes as part of one branch and one deployment, we need to somehow be able to time them such that they all complete together. If we have multiple ALTER TABLE changes on large tables, we need to find a way to not only time those to complete together but also somehow be able to run them all concurrently without putting too much load on the production database.

And on top of it, some of the changes in the migration might actually have dependencies. The user is given a free hand to change their branch, and when the time comes to deploy the branch's changes to production, we may find that one change assumes another already applies.

So, the task is to be able to run some changes concurrently, time long-running changes with immediate changes, and resolve any conflicts that imply ordering changes — all while running concurrently.

Long running changes concurrency

As described in How Online Schema Change tools work, PlanetScale uses an elaborate copy-and-swap algorithm to emulate an ALTER TABLE operation. We create a new table in the likeness of the original table, we modify that table, we bring the new table in sync with the original table by both copying over the existing rows as well as applying the ongoing changes, and we finalize by swapping the two from under the hands of the application.

This emulation mechanism is what allows us some concurrency and control over the cut-over timing. As we complete copying over a table's existing data set, we can continue to watch the ongoing changes to the table, technically indefinitely, or until we decide that it's time to cut over. We impose a brief lock to finalize the last few changes to make the swap. This allows us to run multiple concurrent operations on different tables and keep pushing the final cut-over until we know all operations are ready to complete.

And we don't have to overwhelm the production database during that time. We may alternate between the copying phases — the heavy-lifting part of the emulation — of the different tables and only parallelize the tailing of the ongoing changes.

When we stage a deployment request, we begin by running — but not completing — all long-running changes. When we find, possibly hours later, that all long-running changes are ready to complete, we then introduce the immediate changes — like CREATE TABLE, ALTER VIEW, and similar statements. We can then apply the final cut-over for all long-running changes and the immediate changes, near-atomically, a few seconds apart.

Alas, what happens when one change depends on another?

Resolving dependencies, and supporting concurrency of in-order statements

Consider these simplified two changes:

ALTER TABLE t ADD COLUMN info VARCHAR(128) NOT NULL DEFAULT '' AFTER id;
ALTER VIEW v AS SELECT id, info FROM t;

In production, the column info does not exist. The view v in production does not read from this column. To deploy these two changes, we absolutely have to first apply the change to t, and only then the change to v. This seems pretty straightforward: complete the migration on t, and immediately apply the change to v.

However, how do we go about the reverse?

ALTER TABLE t DROP COLUMN info;
ALTER VIEW v AS SELECT id FROM t;

If we first make the change to t, then v becomes invalid. At first sight, it may appear that we should first apply the change to v, followed by the change to t. However, the table t may be large enough that it takes hours to migrate. If we want to apply the changes together, then the way to go is:

  • Begin the change on t.
  • Wait until the change is ready to complete.
  • Issue the immediate change on v.
  • Follow by completing (cutting-over) the change on t.

The scenarios may be more complex when multiple, nested views are involved, which are based on yet multiple tables being changed in the deployment request.

Using schemadiff

PlanetScale continues to utilize Vitess's schemadiff library, which can determine, where possible, a valid sequence (read: ordering) of changes given two schemas. When schemadiff reads a schema, it maps and validates any dependency between entities. For example, it can validate that table and columns referenced by some view exist or that there are no cyclic view definitions (v1 reads from v2, which reads from v1).

When schemadiff compares two schemas and generates the diff statements, it also analyzes the dependencies between those statements. If any two diff statements affect entities with a dependency relationship in the schema(s), then schemadiff knows it needs to resolve the ordering of those two diffs. If yet another diff affects entities used by either of these two, then schemadiff needs to resolve the ordering of all three. All the diffs are thus divided into equivalence classes: distinct sets where nothing is shared between any two sets and where the total union of all sets is the total set of diffs.

If you take a sample diff from one equivalence class and then some sample diff from a different equivalence class, you know there's absolutely no dependency between the two. They can be executed in any order. However, any two diffs within the same equivalence class can have a dependency and should be treated as if they do, although in some cases, the two could be executed in any different order. To that effect, for each equivalence class, schemadiff finds a permutation of the diffs such that if executed in order, the validity of the entire schema is preserved. It's worth reiterating that changes to the underlying database can only be applied sequentially. Thus, we must validate that the schema remains valid throughout the in-order execution. schemadiff achieves this by running in-memory schema migration and validation at every step.

a. Given a set of diffs,

b. Group them into equivalence classes, where changes to elements that have dependencies are grouped together.

c. Ordering of equivalence classes is arbitrary.

d. Within an equivalence class there must be a valid ordering.

Orchestrating Vitess

PlanetScale then takes that valid ordering of diffs as the blueprint for a deployment where it runs the migrations concurrently via Vitess, staging the changes until it determines that all deployments are ready to complete. At this time, it seals the change near-atomically.

When all migrations are complete, PlanetScale then stages tentative reverts for all migrations. The user has a 30-minute window to undo those schema changes without losing data accumulated. If the user does choose to revert (say, some parts of the app appear to require still the old schema or if performance tanks due to wrong indexing), then those reverts are likewise applied near-atomically. Notably, the reverts are finalized in reverse ordering to the original deployment. There is no need for computation here: we rely on the fact that the original deployment was found to have a step-by-step valid ordering. Undoing those changes in reverse order mathematically maintains that validity.

Limitations

Resources are not infinite, and only so many changes can run concurrently. Altering a hundred tables in one deployment request is not feasible and possibly not the best utilization of database branching. It is possible to go too far with a branch so that the changes are logically impossible to deploy (or rather, so complex that it is not possible to determine a reliably safe path). Like code, schema changes should be made and deployed with measures in place.

Conclusion

Treating a deployment request, a group of schema changes, as a unit that should be deployed all or none is a difficult task that requires complex validation, scheduling, and execution. But the effort pays off: we know that the deployment is cancellable up to the very last moment and without making any impact on production. We only have one potential point in time that requires our attention. We actually control that point in time. We don't need to tell the database how to go about the changes; we only need to tell it what we would like to have.