























There are very few scenarios in which an eventually consistent database is preferable over a strongly consistent database. Further, in a multi-region application scenario where scaling is necessary, choosing either an undistributed database or an eventually consistent database is even more questionable. So what motivates engineers to ignore strongly consistent distributed databases? We have seen many reasons, but wrong assumptions drive most of them.
As we explained in Part 1 of this series, the CAP theorem is widely accepted yet often misinterpreted. When many people misinterpret a well-known theorem, it leaves a mark. In this case, many engineers still believe that eventual consistency is a necessary evil.
It is slowly sinking in that consistency should not be sacrificed, yet many databases still put consistency second. Why is that? Some popular databases offer options that deliver higher consistency, but only at the cost of potentially very high latencies. Their sales messaging might even claim that delivering consistency at low latencies in a multi-region distributed database is incredibly hard or even impossible, and the developer audience has salient memories of experiencing very poor latencies in databases that were not built for consistency. Combined, they jointly fortify the misconception that strong consistency in a distributed database with relatively low latencies is impossible.
Many engineers build according to the “Premature optimization is the root of all evil” (Donald Knuth) principle, but that statement is only meant to apply to small inefficiencies. Building your startup on a strongly consistent distributed scalable database might seem like a premature optimization, because initially, your application doesn’t require scale and might not require distribution. However, we are not talking about small inefficiencies here. The requirement to scale or distribute might arise overnight when your application becomes popular. At that point, your users have a terrible experience, and you are looking at a substantial challenge to change your infrastructure and code.
This used to have some truth to it since distributed databases were new, and many came with severe limitations. They did not allow joins, only allowed key-value storage, or required you to query your data according to predefined sharding keys, which you couldn’t change any more. Today, we have distributed databases that have flexible models and provide the flexibility you are used to with traditional databases. This point is very related to the previous point, which ignores that nowadays, starting to programming against a strongly consistent distributed database is just as easy and probably easier in the long run compared to a traditional database. If it’s just as easy, then why not optimize from the start?
Distributed databases are often created by people who have experienced problems with eventual consistency. For example, FaunaDB was built by former Twitter engineers after having experienced how difficult it is to build a scalable system on top of the eventually consistent databases that were popular around that time, such as Cassandra. These problems typically manifest when a new company starts to scale, hence many younger engineers have never experienced them first hand.
Sometimes painful things can teach us lessons that we didn’t think we needed to know.
— Amy Poehler
Discussing the dangers of eventual consistency typically leads to the “it works for me” argument from engineers who simply haven’t experienced any issues yet. Since that often takes months (or years, if you are lucky), let’s look at an analogy.
A while ago, my best friend was about to miss an appointment, so I lent him my bike. I was happy that I helped out, he was happy, and everything went well. That happiness quickly turned into pain when he tried to jump the bike onto a side-walk. You see… I had tinkered with the bike earlier that day and had forgotten to tighten the front wheel. He came back with a huge purple bruise.

The bike example is very similar to working with a database that is not strongly consistent. Everything will go well until you try to lift the bike’s wheel (or in other words, until your company lifts off and starts scaling up).
At the moment your application needs to scale up, you typically do so by replicating services. Once the database becomes the bottleneck, you replicate your traditional database or move to a distributed database. Sadly, at that point, features in your application might break when you start replicating your database. Until now, you hadn’t noticed these problems since the database ran on a single node. At that point, two things might happen:
The developers who end up in the first situation are often already experienced in dealing with eventually consistent systems. They will now either accept that they can’t deliver on some features, or will build a complex and hard-to-maintain layer on top of the database to get what they need. In essence, they attempt to develop a strongly consistent database on top of an eventually consistent one. That’s a shame since other people have designed distributed databases from the ground up that will not only be more efficient, but don’t require maintenance from your development team!
The developers who end up in the second situation are riding a partly invisible bike. They do not realize that the wheel is loose, do not see the wheel detach, and once they look up after falling, they still see a completely intact bike.

At the moment things go wrong, the complexity to resolve these bugs is high for several reasons:
The invisible bike example is still too forgiving. In reality, others are probably depending on your application. So basically, you are riding an invisible bike while others (your clients) are standing on your shoulders.

Not only will you fall, but they will fall with you and land on top of you–heavily and painfully. You might not even survive the fall at that point; in other words, your company might not survive the storm of negative feedback from your clients.
The moral of the story? If you had chosen a strongly (vs.eventually) consistent database from the beginning, you would not have to consider going through a complex and resource-intensive project like migrating your database at a point when your clients are already frustrated.
Choosing an eventually consistent database for scaling was justified a few years back when there was simply no other choice. However, we now have modern databases that can scale efficiently without sacrificing data consistency or performance. . Moreover, these modern databases also include several other awesome features that go beyond consistency, such as ease of use, serverless pricing models, built-in authentication, temporality, native GraphQL, and more. With a modern database, you can scale without opening Pandora’s box!
And, if after reading this series of articles, you still choose not to use a strongly consistent distributed database, please at least make sure to tighten your wheels (in other words, read and understand different databases’ consistency guarantees).
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。