惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
Indexing JSON in MySQL — PlanetScale
Aaron Francis · 2022-10-04 · via Blog — PlanetScale

Aaron Francis |

MySQL gave us the JSON data type back in mid-2015 with the release of MySQL 5.7.8. Since then, it has been used as a way to escape rigid column definitions and store JSON documents of all shapes and sizes: audit logs, configuration settings, 3rd party payloads, user-defined fields, and more.

Although MySQL gives us functions for reading and writing JSON data, you’ll quickly discover something that is conspicuously missing: the ability to directly index your JSON columns.

In other databases, the best way to directly index a JSON column is usually through a type of index known as a Generalized Inverted Index, or GIN for short. Since MySQL doesn’t offer GIN indexes, we’re unable to directly index an entire stored JSON document. All is not lost though, because MySQL does give us a way to indirectly index parts of our stored JSON documents.

Depending on the version of MySQL that you're using, you have two options for indexing JSON. In MySQL 5.7 you would have to create an intermediate generated column, but starting in MySQL 8.0.13, you can create a functional index directly.

Let’s start with a example table used for logging various actions taken in an application.

CREATE TABLE `activity_log` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `properties` json NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`id`)
)

Into that table we’ll insert JSON documents that have this shape:

{
  "uuid": "e7af5df8-f477-4b9b-b074-ad72fe17f502",
  "request": {
    "email": "little.bobby@tables.com",
    "firstName": "Little",
    "formType": "vehicle-inquiry",
    "lastName": "Bobby",
    "message": "Hello, can you tell me what the specs are for this vehicle?",
    "postcode": "75016",
    "townCity": "Dallas"
  }
}

In our example, we’ll be indexing the email key inside the request object. This will allow our (fictional) users to quickly find forms submitted by specific people.

Let’s take a look at our first option for indexing: generated columns.

Indexing JSON via a generated column

A generated column can be thought of as a calculated, computed, or derived column. It is a column whose value is the result of an expression, rather than direct data input. The expression can contain literal values, built-in functions, or references to other columns. The result of the expression must be scalar and deterministic.

Since we’re trying to index the request.email field in the properties column, our generated column will use the JSON unquoting extraction operator to pluck the value out.

To verify that we’ve formed our expression correctly, we’ll first run a SELECT statement and inspect the results.

mysql> SELECT properties->>"$.request.email" FROM activity_log;
+--------------------------------+
| properties->>"$.request.email" |
+--------------------------------+
| little.bobby@tables.com        |
+--------------------------------+

The ->> operator is a shorthand, unquoting extraction operator, making it equivalent to JSON_UNQUOTE(JSON_EXTRACT(column, path)). We could have written the previous SELECT statement using the longhand and gotten the same result.

mysql> SELECT JSON_UNQUOTE(JSON_EXTRACT(properties, "$.request.email"))
    ->   FROM activity_log;
+-----------------------------------------------------------+
| JSON_UNQUOTE(JSON_EXTRACT(properties, "$.request.email")) |
+-----------------------------------------------------------+
| little.bobby@tables.com                                   |
+-----------------------------------------------------------+

Which method you choose is a matter of personal preference!

Now that we’ve confirmed our expression is valid and accurate, let’s use it to create a generated column.

ALTER TABLE activity_log ADD COLUMN email VARCHAR(255)
  GENERATED ALWAYS as (properties->>"$.request.email");

The first part of the ALTER statement should look very familiar, we’re adding a column named email and defining it as a VARCHAR(255). In the latter half of the statement we declare that the column is generated and that it should always be equal to the result of the expression properties->>"$.request.email".

We can confirm our column has been added by selecting it as we would any other column.

mysql> SELECT id, email FROM activity_log;
+----+-------------------------+
| id | email                   |
+----+-------------------------+
|  1 | little.bobby@tables.com |
+----+-------------------------+

You’ll see that MySQL is now maintaining this column for us. If we were to update the JSON value, the generated column value would change as well.

Now that we have our generated column in place, we can add an index to it like we would any other column.

ALTER TABLE activity_log ADD INDEX email (email) USING BTREE;

That’s it! You’ve now indexed the request.email key in your JSON properties column. Let’s verify that MySQL would use the index to speed up queries that are filtering on email.

mysql> EXPLAIN SELECT * FROM activity_log WHERE email = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: activity_log
   partitions: NULL
         type: ref
possible_keys: email
          key: email
      key_len: 768
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL

MySQL reports that it plans to use the email index to satisfy this query.

Generated column indexes and the optimizer

MySQL's optimizer is a powerful and mysterious entity. When we give MySQL a command, we’re telling it what we want, not how to get it. Often times MySQL will take our query and rewrite it slightly, which is a good thing! Tens of thousands of hours across dozens of years have gone into making the optimizer effective and efficient.

When it comes to indexes on generated columns, the optimizer can "see through" different access patterns to ensure the underlying index is being used.

We defined an index on email, which is a generated column based on the expression properties->>"$.request.email". We’ve already proven that the index is used when we query against the email column. What’s more interesting is that the optimizer is smart enough to help us out if we forget to query against the named email column!

In the following query, we don’t access the generated column by name, but instead use the shorthand JSON extraction operator. (Some rows omitted from the EXPLAIN statement for brevity.)

mysql> EXPLAIN SELECT * FROM activity_log
    ->   WHERE properties->>"$.request.email" = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
possible_keys: email
          key: email
      key_len: 768
        [...]: [...]

Even though we didn’t explicitly address the column by name, the optimizer understands that there is an index on a generated column based on that expression and opts to use the index. Thanks optimizer!

We can confirm this is the case for the longhand as well.

mysql> EXPLAIN SELECT * from activity_log WHERE
    ->   JSON_UNQUOTE(
    ->     JSON_EXTRACT(properties, "$.request.email")
    ->   ) = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
possible_keys: email
          key: email
      key_len: 768
        [...]: [...]

Again, the optimizer "reads through" our expression and uses the email index.

Not convinced? Let’s take a peek at what the optimizer is doing by running a SHOW WARNINGS after our previous EXPLAIN statement to see the rewritten query.

mysql> SHOW WARNINGS;
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `activity_log`.`id` AS `id`,`activity_log`.`properties` AS `properties`,`activity_log`.`created_at` AS `created_at`,`activity_log`.`email` AS `email` from `activity_log` where (`activity_log`.`email` = 'little.bobby@tables.com')

If you look closely, you’ll see that the optimizer has rewritten our query and changed the equality comparison to reference the indexed column. This is especially useful if you're unable to control the access pattern because the query is being issued from a 3rd party package in your codebase, or you're unable to change this part of your code for some other reason.

If the underlying expression doesn’t match very closely then the optimizer will not be able to use the index, so be sure to take care when creating your generated column. The MySQL documentation explains the optimizer's use of generated column indexes in further detail.

Functional indexes

Beginning with MySQL 8.0.13, you're able to skip the intermediate step of creating a generated column and create what is called a "functional index." The MySQL documentation calls these functional key parts.

A functional index is an index on an expression rather than a column. Sounds a lot like a generated column, doesn’t it? There’s a reason it sounds similar, and that’s because a functional index is implemented using a hidden generated column! We no longer have to create the generated column, but a generated column is still being created.

There are a few gotchas with functional indexes though, especially when it comes to using them for JSON.

It would be nice to create our JSON index like this:

ALTER TABLE activity_log
  ADD INDEX email ((properties->>"$.request.email")) USING BTREE;

But if you do try that, you get a nasty error:

Query 1 ERROR: Cannot create a functional index on an expression that returns a BLOB or TEXT. Please consider using CAST.

So what’s going on here? In our earlier examples, we were the ones in charge of creating the generated column and we declared it as a VARCHAR(255), which is easily indexable by MySQL.

However, when we use a functional index, MySQL is going to create that column for us based on the data type that it infers. JSON_UNQUOTE returns a LONGTEXT value, which is not able to be indexed.

Fortunately, the error message points us in the right direction: we need to cast our value to a type that is not LONGTEXT. Casting using the CHAR function tells MySQL to infer a VARCHAR data type.

ALTER TABLE activity_log
  ADD INDEX email ((CAST(properties->>"$.request.email" as CHAR(255)))) USING BTREE;

Now that we’ve added the index, we’ll see if it works by running an EXPLAIN.

mysql> EXPLAIN SELECT * FROM activity_log
    ->   WHERE properties->>"$.request.email" = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: activity_log
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

Unfortunately, our index isn’t being considered at all, so we’re not out of the woods yet.

Unless otherwise specified, casting a value to a string sets the collation to utf8mb4_0900_ai_ci. The JSON extraction functions, on the other hand, return a string with a utf8mb4_bin collation. Therein lies our problem! Because the collation is mismatched between the query's expression and the stored index, our new functional index isn’t being used.

The final step is to explicitly set the collation of the cast to utf8mb4_bin.

ALTER TABLE activity_log
  ADD INDEX email ((
    CAST(properties->>"$.request.email" as CHAR(255)) COLLATE utf8mb4_bin
  )) USING BTREE;

Rerunning the previous EXPLAIN, we can see that we’re finally in a position to use the functional index.

mysql> EXPLAIN SELECT * FROM activity_log
    ->   WHERE properties->>"$.request.email" = 'little.bobby@tables.com';
*************************** 1. row ***************************
           id: 1
possible_keys: email
          key: email
      key_len: 1023
        [...]: [...]

Clearly functional indexes come with a few pitfalls, some of which are explicit and easy to debug, and some that require a little bit more digging into the documentation.

Remember that functional indexes use hidden generated columns under the hood. If you prefer to take control of the generated column yourself (even in MySQL 8.0.13 and later) that’s a perfectly reasonable approach!

While direct JSON indexing may not be available in MySQL, indirect indexing of specific keys can cover a majority of use cases.

Don’t just stop with JSON, either! You can use generated columns and functional indexes across all types of common, hard to index patterns.

Go forth and index with confidence.