惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
Consensus algorithms at scale: Part 7 - Propagating requests — PlanetScale
Sugu Sougoumarane · 2022-07-01 · via Blog — PlanetScale

Sugu Sougoumarane |

If you’re still catching up, you can find links to each article in the series at the bottom of this article.

We have saved the most difficult part for last. This is where we put it all together. Let us start with a restatement of the requirements for propagation of requests during a leadership change:

Propagate previously completed requests to satisfy the new leader’s durability requirements.

Recap of parts 1-6

  • We have redefined the problem of consensus with the primary goal of solving durability in a distributed system, and are approaching the problem top-down.
  • We have shown a way to make durability an abstract requirement instead of the more rigid approach of using majority quorums.
  • We have defined a high level set of rules that can satisfy the properties of a consensus system while honoring arbitrary (but meaningful) durability requirements.
  • We have shown that conceptualizing leadership change as revocation and establishment leads to more implementation options that existing systems don’t utilize.
  • We have also shown that there exist two fundamentally different approaches to handling race conditions, and covered their trade-offs.
  • In the previous post, we covered how requests are completed as a precursor to analyzing propagation.

You can find links to each article in the series at the bottom of this article.

The simple case

For lock-based systems, and for planned changes, we have the opportunity to request the current leader to demote itself. In this situation, the current leader could ensure that its requests have reached all the necessary followers before demoting itself. Once this is done, the elector performs the leadership change and the system can resume.

We will now look at how propagation should work if there are failures.

Discovering completed requests

If a system has encountered failures, then the elector must indirectly revoke the previous leadership by requesting the followers to stop accepting any more requests from that leader. If enough followers are reached such that the previous leader cannot meet the durability criteria for any more requests, we know that the revocation is successful.

This method, apart from guaranteeing that no further requests will be completed by that leader, also allows us to discover all requests that were previously completed.

All we have to do is propagate those requests to satisfy the new leader’s criteria. But this is not as simple as it sounds.

There are many failure cases that make this problem extremely difficult:

  • There may be a request that is incomplete. In this situation, the elector may or may not discover this request.
  • An elector that discovers a tentative request may not be able to determine if that request has become durable.
  • Propagation of a request can fail before completion.
  • An elector that does not discover an incomplete request could elect a new leader that accepts a new request, which may fail before completion.
  • A subsequent elector may discover such multiple incomplete requests.
  • Another elector may discover only one of the incomplete requests, may propagate it as tentative, and fail before marking it as complete.
  • A final elector can discover this durable request, and a newer conflicting incomplete request, and may not have enough information to know which one to honor.

Ground rules

To address the above failure modes, let us first look at what we can and cannot do:

  • An elector must be able to reach a sufficient number of followers to revoke the previous leadership. If this is not possible, the elector is blocked.
  • An elector need not (and may not be able to) reach all the followers of a leader.

Some more inferences:

  • An elector is guaranteed to find all requests that have become durable.
  • If a request was incomplete, an elector may not find it. If not found, it is free to move forward without that request. When that request is later discovered, it must be canceled.
  • If an elector discovers an incomplete request, it may not have sufficient information to know if that request was actually durable or complete. Therefore, it has to assume that it might have completed, and attempt to propagate it.
  • If an elector discovers an incomplete request and can determine with certainty that it was incomplete, it can choose either option: act as if it was discovered, or not discovered.

Let us now discuss some options.

Versioning the decisions

It is safe to propagate the latest discovered decision. A decision to propagate a previous decision is a new decision.

We can use the following approach:

  1. Every request has a time-based version.
  2. A leader will create its request using a newer version than any previous request.
  3. An elector that chooses to propagate an incomplete request will do so under a new version.
  4. An elector that discovers multiple conflicting requests must choose to propagate the latest version.

Completed requests do not need versioning.

The above approach solves two difficult corner cases:

  1. If we discover two conflicting requests, it means that the latest request was created because the previous elector did not discover the old one. This essentially means that the old one definitely did not complete. So, it is safe to honor the new elector’s decision.
  2. If we propagate an existing request, it is also under a new version. It will therefore need to satisfy durability requirements under the new version without conflating itself with the old version.

Paxos uses proposal numbers to version its decisions, and Raft uses leadership term numbers.

But you can use other methods for versioning. For example, one could assign timestamps for the requests instead of using leadership terms or proposal numbers.

Anti-flapping rules

Most large-scale systems have anti-flapping rules that prevent a leadership from changing as soon as one was performed. This is because such an occurrence is usually due to a deeper underlying problem, and performing another leadership change will likely not fix it. And in most cases, it would aggravate the underlying problem.

In one of the systems that I knew of, the payload of the request was so big that it was causing the transmission to timeout. This resulted in a failure being detected and caused a leadership change. However, the new leader was also incapable of completing the request due to the same underlying problem. The problem was ultimately remedied by increasing the timeout.

Serendipitously, anti-flapping rules also mitigate the failure modes described above. Versioning of in-flight requests is less important for such systems.

MySQL and Vitess

The MySQL binlogs contain metadata about all transactions. They carry two pieces of relevant information:

  1. A Global Transaction ID (GTID), which includes the identity of the leader that created the transaction.
  2. A timestamp.

This metadata is faithfully propagated to all replicas. This information is sufficient to resolve most ambiguities if conflicting transactions are found due to failures.

However, the faithful propagation of the transaction metadata breaks the versioning rule that the decision of a new elector must be recorded under a new timestamp.

The Orchestrator, which is the most popular leadership management system for MySQL, has built-in anti-flapping rules. These rules mitigate the above failure modes. This is the reason why organizations have been able to avoid split-brain scenarios while running MySQL at a massive scale.

In Vitess, we use VTorc, which is a customized version of the Orchestrator, and we inherit the same safeties. But we also intend to tighten some of these corner cases to minimize the need for humans to intervene if complex failures ever happen to occur.

Stay tuned for part 8 of the series, where we will pull everything together and conclude the series with some final thoughts.

Read the full Consensus Algorithms series