惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
Learn Horizontal Scaling on PlanetScaleDB with Vitess — Rate Puppies in a Rust app with Sharded MySQL Database — PlanetScale
Jiten Vaidya · 2020-08-15 · via Blog — PlanetScale

Jiten Vaidya |

Since writing this blog we have released a new version of PlanetScale. Learn more about what we’ve built and give it a try, and be sure to check out our docs.

Please note, this blog refers to PlanetScaleDB v1 and is not applicable to our latest product.

At PlanetScale, we have built PlanetScaleDB, a fully managed database-as-a-service on top of open source Vitess that enables horizontal scaling of MySQL far beyond what you can do with a single instance. In this blog, we’ll explain how sharding works in Vitess and on PlanetScaleDB.

A sharded database is a collection of multiple databases (shards) with identical relational schemas. Vitess allows your application to treat a sharded database as though it is a humongous monolithic database without having to worry about the complexities of sharding. Because of this, you can start with a small database on PlanetScaleDB and grow to massive scale without changing your application logic.

In this blog post, we’ll explore the Vitess sharding concepts: VSchema, Vindex, and Vitess Sequences using a sample dog rating application called “Goodest Doggo”. This sample app allows users to rate puppies and as you can imagine, needs to be designed to grow to humungous scale.

How is Vitess sharding different?

A sharded database is a collection of multiple databases (shards) with identical relational schemas. Many database systems that use shards for scaling shard the data in a table without consideration for co-locating the data that belongs together. This results in inefficiencies around writing the data in a transactionally consistent fashion as well as reading the data. Vitess, in contrast, allows you to co-locate the related data.

Assume you have a users table with the id column as the primary key and an orders table with its own id column as its primary key and user_id column as a secondary key. Vitess allows you to shard the user table using its id column and shard the orders table using the user_id column. This ensures both the user row for a user and the order rows for that user live in the same shard. A VSchema allows you to express this information.

In other words, just as a relational schema tells us how the data is organized within a single database, using tables, columns and indexes, Vitess uses a VSchema to define how the data is organized in shards across multiple databases. This allows us to define mechanisms which make it more efficient to access the data from these shards.

What are the elements of a VSchema?

Just as you would write a SQL statement that defines your relational database schema by creating table definitions which define columns and indexes, you define a VSchema (or a sharding scheme as it is called in PlanetScaleDB) by populating a JSON document that has the following information:

  1. "sharded": whether the database is sharded or not
  2. "vindexes": definition of all vindex types used in the VSchema
  3. tables”: one entry for each table, each table entry has the following information:

a. Primary Vindex applied to a column in the table b. (optional) Secondary Vindexes applied to columns c. (optional) Sequences definitions columns for ids

As an example, consider the following table named “puppers” that represents the dogs in our application:

CREATE TABLE IF NOT EXISTS puppers (
  `id` BIGINT(22),
  `name` VARCHAR(256),
  `image` VARCHAR(256),
  PRIMARY KEY(id)
);

Here is an example of a simple VSchema for a sharded database for the table above that illustrates the three elements of a VSchema:

{
  "sharded": true,
  "vindexes": {
    "hash_vdx": {
      "type": "hash"
    }
  },
  "tables": {
    "puppers": {
      "column_Vindexes": [
        {
          "column": "id",
          "name": "hash_vdx"
        }
      ]
    }
  }
}

What is a Vindex?

A relational database has a schema that consists of tables, columns and indexes amongst other elements. In the same way an index makes it efficient to access a given row in a given table in an unsharded database, a Vindex for a given table in a sharded database allows you to access a row quickly by allowing you to determine which shard a row lives in. Like indexes, Vindexes can be primary or secondary and a given table can only have a single primary Vindex, but can have multiple secondary Vindexes.

How does a Vindex map a row to a shard?

To understand this, you will have to understand the concept of keyspaces and keyspace_ids. In the Vitess world every row in every sharded database has a keyspace_id. The keyspace_id is not stored, but is computed by applying a specific sharding function to the value of a specific column in that row. The keyspace_ids range from 0x00 - 0xFF and this range represents the entire keyspace. Vitess shards cover this entire range. This is why a sharded database in Vitess is called a keyspace. Each shard spans a range in the keyspace and shards are named by starting and ending keyspace_id values for the shard. Let us take the example of a 4-shard keyspace. If we divide the range 0x00 to 0xFF (hexadecimal values) in four equal ranges, we get:

0x00-0x40
0x40-0x80
0x80-0xC0
0xC0-0xFF

Thus the four shards would be named: “00-40”, “40-80”, “80-C0”, “C0-FF”. Vitess drops the “00” at the beginning of the keyspace range and the FF at the end, so the shards are called “-40”, “40-80”, “80-C0”, “C0-” instead.

Vindexes allow you to map the value of a particular column in a given row to one or more keyspace_ids. Each shard has a starting and ending keyspace_id, thus given a keyspace_id, you can deterministically tell which shard a row belongs to.

Just like an index, a Vindex is applied to a column in a table, but a Vindex has an additional property called sharding function. Vitess gives you 15 predefined sharding functions and certain sharding functions go well with columns of certain types, for example you would typically use “hash” sharding function with a numeric column, or you would use “unicode_loose_md5” for a varchar or varbinary column. You can also write custom sharding functions and use those instead.

What is a Vitess sequence?

In an unsharded database when you need to assign a monotonically increasing value to a row, you can define a column to be of type “autoincrement”. In the sharded world, if you do this, you would end up having duplicate values in the same column across shards. Vitess solves this problem by allowing you to create ids which are monotonically increasing and unique across shards for a given table by defining sequences in the VSchema. The implementation of Vitess Sequences is backed by a row in a table a secondary database. Vitess limits the number of writes needed by allowing you to cache a certain number of values.

Let us make the id column in the puppers table above of type Vitess sequence by adding the following segment to table entry for puppers in our VSchema:

"auto_increment": {
	"column": "id",
	"sequence": "pupper_seq"
}

Which gives the following VSchema:

{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    }
  },
  "tables": {
    "puppers": {
      "column_Vindexes": [
        {
          "column": "id",
          "name": "hash"
        }
      ],
      "auto_increment": {
        "column": "id",
        "sequence": "pupper_seq"
      }
    }
  }
}

How to define sequences?

You will notice that we are using the term pupper_seq without defining it anywhere. Here is how that process works. You typically define sequences in an unsharded keyspace that lives alongside your main keyspace. In the Vitess world this keyspace is typically called a lookup keyspace. Here is how you would define the sequence in the lookup keyspace. You first create a table named puppers_seq and you insert one row in it initializing the sequence. Note the comment vitess_sequence associated with the CREATE TABLE statement. It’s important for you to keep that because that is used by Vitess to treat this table distinctly from other tables which hold real data.

CREATE TABLE IF NOT EXISTS pupper_seq (
  `id` INT,
  `next_id` BIGINT,
  `cache` BIGINT,
  PRIMARY KEY(id)
) comment 'vitess_sequence';

INSERT INTO pupper_seq (id, next_id, cache) VALUES (0, 1, 3);

After defining this table, you apply the following VSchema to the lookup keyspace:

{
  "sharded": false,
  "tables": {
    "pupper_seq": {
      "type": "sequence"
    }
  }
}

These two steps define pupper_seq as a Vitess sequence and it can now be used by the VSchema for the sharded keyspace as we have used it above.

Putting all this together

So, for the Goodest Doggo dog rating application, here is how we will organize our data. We will create two keyspaces, one called lookup, unsharded and one called puppers which is sharded. We will start with two shards and we can reshard as needed as we go. The lookup keyspace will hold tables we need for sequences and for lookup Vindexes. The doggers keyspace will hold the actual data. Here is the schema and VSchema for these two keyspaces:

The keyspace doggers has three tables: puppers, ratings, and users. We want the id column to be of type autoincrement, but as you can see below, because this is a sharded database so we do not specify the column as autoincrement in the schema, but instead we define a Vitess sequence on that column backed by the puppers_seq table in the lookup schema. This is the same for the id column for the ratings table.

We want the table puppers sharded by its id column, but the table ratings sharded by the user_id column. This is expressed in the VSchema.

Here is the Schema and VSchema for the puppers database:

CREATE TABLE `puppers` (
  `id` bigint(22) NOT NULL,
  `name` varchar(256) DEFAULT NULL,
  `image` varchar(256) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;

CREATE TABLE `ratings` (
  `id` bigint(22) DEFAULT NULL,
  `user_id` bigint(22) DEFAULT NULL,
  `rating` bigint(20) DEFAULT NULL,
  `pupper_id` bigint(22) DEFAULT NULL,
  KEY `pupper_id` (`pupper_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;

CREATE TABLE `users` (
  `id` bigint(22) NOT NULL,
  `email` varchar(64) DEFAULT NULL,
  `password` varbinary(256) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;
{
  "sharded": true,
  "vindexes": {
    "binary_md5_vdx": { "type": "binary_md5" },
    "hash_vdx": { "type": "hash" }
  },
  "tables": {
    "puppers": {
      "columnVindexes": [{ "column": "id", "name": "hash_vdx" }],
      "autoIncrement": { "column": "id", "sequence": "pupper_seq" }
    },
    "ratings": {
      "columnVindexes": [{ "column": "pupper_id", "name": "hash_vdx" }],
      "autoIncrement": { "column": "id", "sequence": "rating_seq" }
    },
    "users": {
      "columnVindexes": [{ "column": "id", "name": "binary_md5_vdx" }]
    }
  }
}

Here is the Schema and VSchema for the lookup database:

CREATE TABLE IF NOT EXISTS pupper_seq (
  id INT,
  next_id BIGINT,
  cache BIGINT,
  PRIMARY KEY(id)
) comment 'vitess_sequence';

INSERT INTO pupper_seq (id, next_id, cache) VALUES (0, 1, 3);

CREATE TABLE IF NOT EXISTS rating_seq (
  id INT,
  next_id BIGINT,
  cache BIGINT,
  PRIMARY KEY(id)
) comment 'vitess_sequence';

INSERT INTO rating_seq (id, next_id, cache) VALUES (0, 1, 3);
{
  "sharded": false,
  "tables": {
    "pupper_seq": {
      "type": "sequence"
    },
    "rating_seq": {
      "type": "sequence"
    }
  }
}

Want to see this in action?

We have created a Quickstart Demo on PlanetScaleDB that creates the databases we described above and populates them with data from our sample dog rating application.

To get started, you will need to:

  1. Create a PlanetScaleDB account.
  2. Select the Quickstart Demo VSchema.
  3. This will spin up a cluster and two databases and when successful will present you with a database URL.
  4. In the meantime, download and start the application for your platform.
  5. Once the application is running, it will provide you with a URL (such as http://localhost:8000) on your local host that you can browse to.
  6. This web application will prompt you for the connection string for the cluster you created with the Quickstart Demo on PlanetScaleDB.
  7. Input the connection string into the app and start rating the puppies.
  8. You can look at how the data is distributed across shards by clicking on the tab “Show Data”.
  9. You can also connect to the database to a MySQL client and try running queries as you run the app to see how the data is distributed across shards.