惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
Database branching: three-way merge for schema changes — PlanetScale
Shlomi Noach · 2023-04-27 · via Blog — PlanetScale

Shlomi Noach |

You may be familiar with Git's three-way merge as a way to resolve source code changes made by developers on their independent branches. PlanetScale offers three-way merge for your schema branches, making schema change collaboration simpler and safer. It's similar in concept, but completely different in implementation. In the remainder of this post, we illustrate the technical implementation and the nuances of diffing schemas vs. diffing code.

What does it mean to merge schema changes in the first place?

PlanetScale offers a model of schema branching and deploy requests. In short, a developer may branch the main database, creating a copy of the schema in a dev environment, where they are free to make any changes without affecting production. Multiple developers can do the same, concurrently. At some point, the developer wants to apply their schema changes to production. They create a deploy request on PlanetScale, similar to a pull request on GitHub.

The deploy request is where the developer and their team review the changes. The deploy request page presents a semantic diff of the changes made — e.g., an ALTER TABLE foo ..., a CREATE TABLE bar (...), etc. PlanetScale uses Vitess's schemadiff library to generate the semantic diff between the main (production) branch and the developer's branch. If approved, the changes are enqueued in the deploy queue, to be eventually deployed in a non-blocking fashion.

The case for three-way merge arises when multiple developers do the same, concurrently. Say Dev 1 created a branch a couple of days ago. During this time, Dev 2 created and deployed their own branch, merging it into main, the production branch. Dev 1's branch and changes now may or may not be compatible with main. Not only do they not reflect or contain the new schema in main, but they may also outright conflict with the newly made changes!

As long as Dev 1 still works on their branch, that's fine. But at some time, they will want to deploy their changes. It's time to put their changes in the deployment queue. But, are the changes at all valid? This is where three-way merge is invoked. It is essentially a mechanism that determines whether branches slated to be merged conflict with one another, overlap one another, or are completely unrelated and have no impact on one another.

Setting the database branching terminology

In Git, we use terminology such as merge-base, topic-head, etc. But we now illustrate a solution tailored to schema changes, and we may as well use different terminology. Let's use main for production: this is what everyone branches from and eventually deploys to. And let's use branch1 and branch2 as branch names created by Dev 1 and Dev 2, respectively.

It's worth pointing out that nothing tracks the changes on a development branch while it's open. Dev 1 may CREATE, ALTER and, DROP all they want. PlanetScale follows up on any changes they make, but, simplified for the purposes of this post, it's only when the developer creates a deployment request that PlanetScale examines their schema to compute the diff in their branch. The diff is one or more SQL statements (we ignore the case where the schema is unchanged here) that would get main to look like branch1. For example, consider these schemas in main and in branch1:

-- main:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

The git diff of the two would be:

 CREATE TABLE `customer` (
        `id` int,
+       `name` varchar(255) NOT NULL DEFAULT '',
        PRIMARY KEY (`id`)
 );

But that's not something a database can work with. Instead, the deploy request generates this semantic SQL diff:

ALTER TABLE `customer` ADD COLUMN `name` varchar(255) NOT NULL DEFAULT ''

This semantic diff is generated for any deploy request.

Schema three-way merge

Assume main is the base branch, and branch1 and branch2 are both enqueued to deploy.

Three-way merge compares the two branches and uses main (hence, three branch comparison) like so:

  1. Compute diff1 as diff(main, branch1). This is similar to main..branch1 in Git notation. We can consider diff1 as a function — i.e., diff1(main) => branch1.
  2. Likewise, compute diff2 as diff(main, branch2).
  3. Look at diff1(diff2(main)). If running diff1 over diff2(main) is invalid (examples to follow), there's a conflict.
  4. Likewise, attempt diff2(diff1(main)). If that's invalid, there's a conflict.
  5. If both are valid but diff1(diff2(main)) != diff2(diff1(main)), there is a conflict.
  6. If both are valid and diff1(diff2(main)) == diff2(diff1(main)), there is no conflict between the two branches.

The algorithm is, in fact, more elaborate. But let's first walk through a few examples to understand how the diffs and three-way-merge work, and what SQL nuances we might hit.

Example: no conflict

Consider this simplified schema for the three branches:

-- main:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);
CREATE TABLE `delivery` (
	`id` int,
	`customer_id` int,
	PRIMARY KEY (`id`)
);

The diffs are:

-- diff1:
ALTER TABLE `customer` ADD COLUMN `name` varchar(255) NOT NULL DEFAULT ''

-- diff2:
CREATE TABLE `delivery` (
	`id` int,
	`customer_id` int,
	PRIMARY KEY (`id`)
)

Clearly, the two branches do not conflict with one another. One adds a column to customer, and the other creates delivery table. Applying the two diffs in either order ends up with the same end result:

CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);
CREATE TABLE `delivery` (
	`id` int,
	`customer_id` int,
	PRIMARY KEY (`id`)
);

Example: clear conflict

In the next example, both branches introduce a new column under the same name but with a different type:

-- main:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`subscription_type` enum('free', 'promotional', 'paid'),
	PRIMARY KEY (`id`)
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	`subscription_type` int unsigned NOT NULL DEFAULT 0,
	PRIMARY KEY (`id`)
);

The diffs are:

-- diff1:
ALTER TABLE `customer` ADD COLUMN `subscription_type` enum('free', 'promotional', 'paid')

-- diff2:
ALTER TABLE `customer` ADD COLUMN `subscription_type` int unsigned NOT NULL DEFAULT 0

Clearly, applying both diffs on top of each other is destined to fail. You cannot add two columns under the same name.

Example: subtle conflict

How about adding two completely different columns?

CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`subscription_type` enum('free', 'promotional', 'paid'),
	PRIMARY KEY (`id`)
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`)
);

The diffs are:

-- diff1:
ALTER TABLE `customer` ADD COLUMN `subscription_type` enum('free', 'promotional', 'paid')

-- diff2:
ALTER TABLE `customer` ADD COLUMN `joined_at` timestamp NOT NULL DEFAULT current_timestamp()

It's possible to apply both diffs, in any order. However, the resulting schema looks different depending on the order. It may look either:

-- diff1(diff2(main)):
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	`subscription_type` enum('free', 'promotional', 'paid'),
	PRIMARY KEY (`id`)
);

-- diff2(diff1(main)):
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`subscription_type` enum('free', 'promotional', 'paid'),
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`)
);

The order of columns in a table matters. Queries that run a SELECT * FROM customer and use positional arguments will get different columns at positions 3 and 4. The two branches conflict with each other. This is similar to a Git merge conflict where two branches append different rows to the end of a file.

We could avoid the conflict if one of the branches positioned the new column anywhere but last. For example:

CREATE TABLE `customer` (
	`id` int,
	`subscription_type` enum('free', 'promotional', 'paid'),
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

The above would lead to a non-conflicting diff:

-- diff1:
ALTER TABLE `customer` ADD COLUMN `subscription_type` enum('free', 'promotional', 'paid') AFTER `id`

Nuance: no conflict

The same cannot be said for index changes. We now add a column and a matching index in one migration, and another index in the second migration:

-- main:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`),
	KEY `name_idx` (`name`(16))
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`),
	KEY `joined_idx` (`joined_at`)
);

The diffs are:

-- diff1:
ALTER TABLE `customer` ADD KEY `name_idx` (`name`(16))

-- diff2:
ALTER TABLE `customer` ADD COLUMN `joined_at` timestamp NOT NULL DEFAULT current_timestamp(), ADD KEY `joined_idx` (`joined_at`)

Strictly speaking, the table structure looks different based on the order we apply the diffs. It can be either of:

-- diff1(diff2(main)):
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`),
	KEY `joined_idx` (`joined_at`),
	KEY `name_idx` (`name`(16))
);

-- diff2(diff1(main)):
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	`joined_at` timestamp NOT NULL DEFAULT current_timestamp(),
	PRIMARY KEY (`id`),
	KEY `name_idx` (`name`(16)),
	KEY `joined_idx` (`joined_at`)
);

However, for practical purposes, the order of indexes is inconsequential. All queries against the table will both behave in the exact same way, as well as perform in the same way, irrespective of the ordering of the keys. The only change is the output of SHOW CREATE TABLE as well as INFORMATION_SCHEMA introspection.

PlanetScale disregards index ordering.

Overlapping changes

The algorithm is more elaborate than described thus far. To reduce developer friction as much as possible, it also considers identical, partial overlap between diffs. For example:

-- main:
CREATE TABLE `customer` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch1:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);
CREATE TABLE `tbl1` (
	`id` int,
	PRIMARY KEY (`id`)
);

-- branch2:
CREATE TABLE `customer` (
	`id` int,
	`name` varchar(255) NOT NULL DEFAULT '',
	PRIMARY KEY (`id`)
);
CREATE TABLE `tbl2` (
	`id` int,
	PRIMARY KEY (`id`)
);

Both branches create the same name column on customer, and then each branch proceeds to make other unrelated changes. Thanks to schemadiff, each of the changes (ALTER, CREATE, ...) is fully formalized and we can analyze the changes one by one.

Is there a conflict with the new name column? Given that both branches completely agree on that particular change, PlanetScale's three-way merge considers this as an overlap and allows it. Should branch1 merge first, branch2's diff auto-adapts and is left to the creation of tbl2 only.

Further reducing friction

Schema changes may take time to run, during which more developers will want to deploy their own changes. There is a deployment queue, first come first served, that only allows a single deploy request at a time to run.

When a developer submits their deploy request, their change is validated against all queued changes. This avoids the situation where the developer waits for hours in queue, only to learn the one deployment before theirs caused a conflict. PlanetScale shoots an early warning so that developers can better use their time in queue.

Conclusion

Schema changes and source code changes share enough similarities that we can offer developers schema lifecycle workflows they are familiar with from their source code workflows. With some adaptations to the obvious differences and challenges a schema change deployment poses, we are able to utilize familiar and trusted logic to manage developer collaboration around schema branching.