惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
The GitHub Blog
The GitHub Blog
F
Fortinet All Blogs
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Cisco Talos Blog
Cisco Talos Blog
P
Privacy & Cybersecurity Law Blog
I
Intezer
Y
Y Combinator Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
N
Netflix TechBlog - Medium
The Hacker News
The Hacker News
AWS News Blog
AWS News Blog
aimingoo的专栏
aimingoo的专栏
A
About on SuperTechFans
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Stack Overflow Blog
Stack Overflow Blog
Hacker News: Ask HN
Hacker News: Ask HN
酷 壳 – CoolShell
酷 壳 – CoolShell
量子位
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
B
Blog
T
Tor Project blog
C
Cybersecurity and Infrastructure Security Agency CISA
云风的 BLOG
云风的 BLOG
博客园_首页
V2EX - 技术
V2EX - 技术
T
Threat Research - Cisco Blogs
腾讯CDC
宝玉的分享
宝玉的分享
博客园 - 叶小钗
罗磊的独立博客
S
Securelist
The Last Watchdog
The Last Watchdog
Google Online Security Blog
Google Online Security Blog
Scott Helme
Scott Helme
博客园 - 司徒正美
W
WeLiveSecurity
有赞技术团队
有赞技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
S
Secure Thoughts
NISL@THU
NISL@THU
N
News and Events Feed by Topic
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
雷峰网
雷峰网
大猫的无限游戏
大猫的无限游戏
K
Kaspersky official blog
IT之家
IT之家

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale Personalizing your onboarding with Markdoc — PlanetScale PlanetScale vs. Amazon Aurora — PlanetScale PlanetScale vs. Amazon RDS — PlanetScale PlanetScale is bringing vector search and storage to MySQL — PlanetScale PlanetScale Managed is now PCI compliant — PlanetScale
Dealing with large tables — PlanetScale
Ben Dicken · 2024-07-10 · via Blog — PlanetScale

Ben Dicken [@BenjDicken] |

Why do we have large tables?

Production database schemas can be complex, often containing hundreds or even thousands of individual tables. However, when dealing with very large data sets, it's often the case that only a small subset of these tables grow very large, into the hundreds of gigabytes or terabytes for a single table.

Consider the following, greatly simplified database schema for a workout-tracking application we'll call MuscleMaker:

Schema for example app

Early on in this product's lifecycle, none of these tables are large enough to cause problems. However, what if this application becomes extremely popular? What if our userbase grows from the thousands to the hundreds of thousands and then into the millions.

In this case, the user table would start small, but eventually would contain several million rows, one per registered user. This is large but still quite a manageable size, with no significant scalability concerns so long as some care is taken with how it is queried and what kinds of indexes we maintain.

As for the exercise table, there are only so many types of exercises that people do. Therefore, the exercise table should not grow larger than a few hundred or a few thousand rows. However, the exercise_log table will have a new row added every time a user completes one exercise. A typical workout is composed of many exercises. Each user will generate 5-10 new rows in the exercise_log each time they work out. Assuming all users work out every day (which, admittedly, might be expecting a lot) this table will grow quickly. This table will gain at least several million rows per day, quickly grow to a total size of many billions or even trillions of rows.

In this scenario, this is the table that could become difficult to scale. Not only will this table likely grow into the terabytes in size, it may also have a large set of hot rows — rows that are frequently accessed. Many users will want to search through their historical exercise data, meaning old rows will likely be queried and aggregated on a regular basis. We cannot treat this table as one with a small set of hot data.

One table growing at a rapid rate is not unique to exercise applications. For example, the message table for a popular chat application would also grow by millions of rows per day. A similar problem arises for a like table in a social media platform's database. There are many such situations where one, or a small number of tables grow much faster than the others. How do we deal with scaling such data, especially when we experience rapid growth?

Option 1: Vertical Scaling

Lets say that initially the database for MuscleMaker has all tables co-located in one PlanetScale database. PlanetScale provides fully-managed Vitess and MySQL powered databases. Vitess is a layer that sits above MySQL, and provides mechanisms for high-availability, scalability, sharding, connection pooling, and more. When connecting to a Vitess database, an application server makes a connection to a VTGate, which acts as a proxy layer. The VTGate then communicates with one or more VTTablets, which in turn communicates with MySQL. Hosted on PlanetScale and powered by Vitess, the architecture of the MuscleMaker database at this point would look like this:

Architecture with one keyspace

Note that typically we'd also have replica nodes set up for every MySQL instances in a keyspace. These are omitted in this article for simplicity, but it is best practice to use replicas for high availability and disaster recovery.

The most straightforward solution to handling a fast growing table is to scale this database vertically. When it is time to scale vertically, we'll need to increase the capacity of the server running the musclemaker keyspace. When scaling like this is needed, we often consider both compute resources (CPUs and RAM) as well as growing the underlying storage capacity (disk). When you have a fast-growing table but your current levels of compute can handle your workload just fine, you may only need to grow your disk size. How difficult this is depends greatly on how your database is managed.

  1. If you are managing a bare-metal machine, then this will require installing and configuring new disks, as well as migrating your data to the new disks or expanding your existing file system.

  2. If you're running your database on a cloud VM such as an EC2 instance, this process is a bit simpler as you don't have to manage hardware yourself. To grow your disk size, you'll want to:

    • Take a snapshot of the disk before resizing for backup / safety purposes.
    • Use the AWS UI or CLI to grow the volume to the desired size
    • Connect to your instance and tell the OS to grow your disk partition to utilize the added capacity.

    With this system, you can do the resize online without taking down your server. However, some other solutions may required migrating to a new server with a larger disk.

  3. On a fully managed solution like PlanetScale, storage scales automatically as usage increases.

However, for our workout app, we will need to grow compute resources along with the increasing table size. We'll be doing a huge amount of inserts and will also be executing queries that access and aggregate historic data on this large table.

With a managed service like PlanetScale, resizing to get more compute is as easy as a few clicks of a button. We also auto-resize storage as your data grows, so handling these cases is simple. If you are managing your own cloud infrastructure or managing bare-metal machines, the process of scaling vertically like this will be more tedious.

This type of scaling works well up to a point. When you have a multi-terabyte database, you typically want a large amount of RAM to keep as much as possible in memory. However, physical machines and cloud VMs can become prohibitively expensive as you start to reach machines that have hundreds of gigabytes of RAM and many CPU cores. Even with a large machine, memory contention can become a problem, slowing down your database and causing problems with your application. A good next step to consider is vertical sharding.

Option 2: Vertical Sharding

Vertical sharding is a fancy phrase with a simple meaning — moving large tables onto separate servers. When sharding vertically, you typically want to identify either the tables that are the largest or have the most reads and writes, and then isolate these onto separate servers. In MuscleMaker, the clear candidate for this is the exercise_log table.

Let's take a look at how we would go about this with a Vitess database.

  1. Before, all of the tables are in the musclemaker keyspace powered by a server with 32 VCPUs with 64 Gigs or RAM and 4 terabytes of provisioned disk space. This machine is having issues keeping up with all of the demand from our application servers.

  2. The next step is to assess what size machine we will need to handle just the exercise_log table on it's own. If this table is currently 2 terabytes and we know that it grows quickly, we will want to start off with extra disk space. We'll choose a machine with 4 TB of disk space, 32 VCPUs and 64 GB of RAM. We will also want to downsize the machine handling the rest of the database to 16 vCPUs and 32 GB of RAM. These two new machines will need to be set up with MySQL, Vitess, and must get added to the Vitess cluster. We'll refer to these two new keyspaces as musclemaker_log and musclemaker_main respectively.

  3. At this point, all of the data (2.5 TB in total) is in the musclemaker keyspace. In Vitess, we must use the MoveTables workflow of the vtcltdclient command line tool to copy the tables into their new keyspaces. The commands to do this look something like this:

    vtctldclient MoveTables --workflow musclemakerflow create --target-keyspace musclemaker --source-keyspace musclemaker_log --tables "exercise-log"
    vtctldclient MoveTables --workflow musclemakerflow create --target-keyspace musclemaker --source-keyspace musclemaker_main --tables "user,exercise"
    
  4. With large tables, these two steps will take a while (hours). While this is happening, all production traffic will still be routed to musclemaker. Vitess gives us the ability to separately run a command to switch query traffic to the new keyspaces. The commands look like:

    vtctldclient MoveTables --workflow musclemakerflow --target-keyspace musclemaker_log switchtraffic
    vtctldclient MoveTables --workflow musclemakerflow --target-keyspace musclemaker_main switchtraffic
    
  5. At this point, production traffic is switched to using the two separate keyspaces. After you are confident everything is working well, the musclemaker keyspace can be taken offline. The architecture of the Vitess cluster now looks like this: Architecture with two keyspace There would likely also need to be some application-level changes in order to properly use the names of the two keyspaces.

After doing this, you've gained the ability to separately scale the storage and compute for the main database and the log database. This is an excellent solution used by many organizations to divide up their large tables, allowing them to continue to scale when they start to encounter limitations with single-primary setups for heavy write workloads. Over time, the musclemaker_log keyspace can be grown to handle more load. But what happens when even one very large server can't handle a gigantic table?

Option 3: Horizontal Sharding

Horizontal sharding is the ultimate solution for scaling massive tables. Horizontal sharding takes a single table and spreads the rows out across many separate servers based on a sharding strategy. Vitess supports many sharding strategies, one of the most common of these being hashing. With this sharding strategy, we choose a table we want to shard, and then select one of the columns to be our hashed sharding key. Each time we get a new row, a hash is generated for the column value, and this hash is used to determine which server to store the row on. It's important to put some thought into which column is to be used for this, as it can have big implications on the distribution of your data, and therefore the performance of the database.

For example, for exercise_log, one option would be to shard based on a hash of the log_id column. Each time we need to write a new exercise_log row, we would generate the ID for the row, hash the ID, and then use this hash to determine which shard to send the row to. Since a hash is used, this provides a (roughly) even distribution of data across all of the shards. There is a problem with this though: the logs for any given user will be spread out across all shards, as shown in the below diagram:

Sharding based on log_id

This means that any time we need to query the log history for one user, we might need to access data on many or all of our MySQL instances. This will be terrible for performance.

We can solve this problem by instead using the user_id as the hashed shard key. Each user_id will produce the same hash, and thus get sent to the same server. This means that when a user adds log events or reads their log, it will all hit the same server. Take a look at how the data is distributed when hashing by user_id instead:

Sharding based on user_id

Continuing with the same example Vitess cluster from before, let's see how we would spread out this log table across four shards:

  1. First, we must make changes to the schema and vschema (Vitess schema) of our cluster. We'll kick this process off by adding a table to handle ID generation for the sharded table:

    create table if not exists exercise_log_id_sequence(id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
    insert into exercise_log_id_sequence(id, next_id, cache) values(0, 1000, 100);
    

    We also need to tell Vitess about this new table and that it is going to be used for ID generation. The vtctldclient ApplyVSchema command can be used for this, with the following VSchema:

    {
      "tables": {
        "exercise_log_id_sequence": {
          "type": "sequence"
        }
      }
    }
    

    Next up is to tell Vitess that we want to shard the exercise_log table by user_id and use the exercise_log_id_sequence table for ID generation. This would also be applied using vtctldclient ApplyVSchema with this VSchema:

    {
      "sharded": true,
      "vindexes": { "hash": { "type": "hash" } },
      "tables": {
        "customer": {
          "column_vindexes": [
            {
              "column": "user_id",
              "name": "hash"
            }
          ],
          "auto_increment": {
            "column": "log_id",
            "sequence": "exercise_log_id_sequence"
          }
        }
      }
    }
    

    Finally, change the schema of the exercise_log table to remove auto_increment. Assigning IDs has shifted to be Vitess' responsibility instead of MySQL:

    alter table exercise_log change log_id log_id bigint not null;
    
  2. Next, we need to spin up four new servers that will be our shards. These will be created within the musclemaker_log keyspace. We'll spin up four machines, each with 16 vCPUs, 32 Gigs of RAM, and 2 terabytes of storage. These will be named using Vitess' convention of shard hash range naming: -40, 40-80, 80-c0, c0-.

  3. When the shards are ready, we need to issue a command to shard the data, which will copy it from the unsharded musclemaker_log server into the four shards, also in the musclemaker_log keyspace.

    vtctldclient Reshard --workflow shardMuscleMakerLog --target-keyspace musclemaker_log create --source-shards '0' --target-shards '-40,40-80,80-c0,c0-'
    
  4. When this finishes, we can switch the traffic to the sharded table:

    vtctldclient Reshard --workflow shardMuscleMakerLog --target-keyspace musclemaker_log switchtraffic
    
  5. The sharding process is now complete. The new architecture of the Vitess cluster looks like this: Final cluster configuration

Sharding in this way has many benefits beyond just being able to spread out the data across many disks. A few of these include:

  1. Increased write throughput. We are no longer reliant on a single primary to handle all writes. Whenever write performance is bottlenecking, we can upsize shards or add new shards.
  2. Backup speed. Instead of backing up a single table on 1 machine, the table is divided up across many shards. Therefore, taking backups of the table can happen in parallel, vastly speeding up overall backup time.
  3. Failure isolation. Any node that goes down only effects a subset of the table, not the whole data set. We also would typically use replication, and Vitess' auto-failover features to mitigate issues here.
  4. Cost savings. In some instances, running many small cloud VM instances is more affordable than a single, top-of-the-line instance.

Vitess also has support for resharding an already-sharded table. This means that as the data size and I/O workload continues to grow, we can expand out to using more and more shards. We can also downsize or use less shards if demand decreases, or data is purged, in the future. Horizontal sharding gives us infinite options for scaling the capacity of a table.

Conclusion

Vertical scaling, vertical sharding, and horizontal sharding are useful techniques for scaling to handle large tables. Each technique is useful for different phases of the growth of a table. Typically, you scale your entire database vertically, then shard vertically, then use horizontal sharding for ultimate scalability of large workloads.