惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
How do Database Indexes Work? — PlanetScale
Justin Gage · 2022-07-14 · via Blog — PlanetScale

Justin Gage |

If you’ve queried a database, you’ve probably used an index, even if you didn’t know it at the time. Database indexes help speed up read queries by creating ancillary data structures that make your scans faster. This post will walk through how database indexes work, with a particular focus on MySQL, everyone’s (well, many people’s) favorite homegrown organic database.

After you read through how databases indexes work, make sure you check out our video on how to speed things up using composite indexes with hashed functions:

Indexes speed up your read queries

Indexes are basically a way to speed up your read queries, particularly ones with filters (WHERE) on them. They’re data structures that exist in your database engine, outside of whatever table they work with, that point to data you’re trying to query.

To avoid the all-too-common librarian metaphor, imagine – perhaps a far out scenario – you’ve got all of your users in a table in your MySQL database (running on PlanetScale, of course). You’re building some functionality into your social app that allows users to search and filter for other users, which means there will be a query running in production1 that runs through your entire users table. Even worse, your app is quite successful, so there are hundreds of thousands of these “users.” Imagine that!

Unfortunately, SELECT-ing from that users table isn’t very performant right now. The filters you’re applying based on inputs in the UI – user location, account type, most recent activity, and other columns in your database – require scans of the entire users table, which is roughly O(N/2) on average. Your query is taking 5-6 seconds to run, which would be fine as a data analyst doing internal work, but isn’t up to snuff for a smooth user experience.

To remedy this, you create a database index on the “most recent activity” column in your users table with CREATE INDEX. Behind the scenes, MySQL goes and creates a new pseudo-table in the database with two columns: a value for most recent activity, and a pointer to the record in the users table. The trick here, though, is that the table gets ordered and stored as a binary tree, with ordered values for that most recent activity column. As a result, your query is O Log(n) efficient, and only takes a second or less to run.

This is the basic gist of indexes. If you know you’re going to run a specific query repeatedly and worry about read performance, creating an index (or several) can help speed that query up.

That’s the simple version though; there’s a lot more going on under the hood, and too many indexes can even degrade that performance you sought out to improve.

How database indexes work under the hood

An index is not magic – it’s a database structure that contains pointers to specific database records. Without an index, data in a database usually gets stored as a heap, basically a pile of unordered2, unsorted rows. In fact, this is actually a setting you can toggle in Microsoft SQL Server and Azure SQL Database.

In practice, data is rarely stored completely unsorted. Instead, you’ll usually use some sort of primary key – which in MySQL can be identical to an index – that could be something like an auto-incrementing integer. But data can, of course, only be sorted by one column, which limits the “binary” efficiency of sorting (with unique values) to a query that filters on that one, ordered column. An index is basically a way of letting your table be sorted by multiple columns, which lets you get binary search efficiency on multiple filter columns.

When you create an index on a column, you’re creating a new table with two columns: one for the column you indexed on, and one that contains a pointer to where the record in question is stored. Though the index will be of an identical length to the original table, the width will likely be a good degree shorter, and as such will take fewer disc blocks to store and traverse. Pointers in MySQL are pretty small, usually fewer than 5 bytes. The afore-linked now “legendary” Stack Overflow post runs through the math around the number of blocks required for storage, for those interested in a deeper look.

Unless you set it up yourself, there are likely already several indexes created in whatever database you’re using right now. You can see any indexes that exist on a particular table with:

SHOW INDEX FROM table_name FROM db_name;

If you run an EXPLAIN statement on your query, you should also see information about which indexes the query plans to use. Here’s the table of possible EXPLAIN output values from the MySQL docs; note the possible_keys and key values, both relating to index selection.

ColumnJSON nameDefinition
idselect_idThe SELECT identifier
select_typeNoneThe SELECT type
tabletable_nameThe table for the output row
partitionspartitionsThe matching partitions
typeaccess_typeThe join type
possible_keyspossible_keysThe possible indexes to choose
keykeyThe index actually chosen
key_lenkey_lengthThe length of the chosen key
refrefThe columns compared to the index
rowsrowsEstimate of rows to be examined
filteredfilteredPercentage of rows filtered by table condition
ExtraNoneAdditional information

This way of using EXPLAIN can also be useful for debugging when creating an index, i.e. verifying that a new one is working as intended.

While indexes always store pointers to the record any index points to, the database doesn’t always need to use those pointers. An index only scan (covering index in MySQL) refers to when the value your query is looking for is contained solely in the index, and thus doesn’t require a table lookup. In PostgreSQL, not all index types support index-only scans.

One thing to remember about indexes is that while they’re beneficial for read queries (including JOINs and aggregates), they can negatively impact your database performance as a whole if you’re not careful. Indexes take up a lot of space, even if that space is smaller than the initial table, and that space is precious; you’ll need to keep an eye on how close you’re inching towards your filesystem’s limit. They also make INSERT queries take longer and force the query engine to consider more options before choosing how to execute a query.

Different types of database indexes

We’ve been mostly referencing database indexes that are unique, but that turns out to be only one of several types. Keep in mind that any of these indexes can have more than one column (besides the pointer) in them. MySQL actually supports up to 16 columns in an index. But we digress.

Keys

You may see “key” used interchangeably with “index.” A KEY in MySQL is a type of non-unique index. Because the index isn’t unique, scanning through it won’t be as efficient as a binary tree, but it will likely be more efficient than linear search. Many columns in your tables will likely fit this bill, e.g. something like “first name” or “location.”

MySQL and Postgres have a pretty major difference between them in how they handle primary key storage. In MySQL, primary keys are stored with their data, which means they effectively take up no extra storage space. In Postgres, primary keys are treated like other indexes and stored in separate data structures.

Unique indexes

A unique index conforms to the traditional unique definition – no two identical non-NULL values – with the caveat that the values are sorted. When data fits those two definitions – unique and sorted – you can use binary search and get that sweet, sweet O log(N). Note that a primary key is a specific type of unique index, where there can only be one per table and the value cannot be NULL.

As an aside, UNIQUE indexes can be used as a way to enforce uniqueness on a particular table column. Since indexes are updated on every insert, trying to insert a non-unique value into a column with a unique index on it will throw an error. The Postgres docs call this out explicitly.

Text indexes

Using the FULLTEXT qualifier will create a text index in most popular relational databases. It can only be applied to text-type columns (CHAR, VARCHAR, or TEXT) in MySQL. Where things get interesting is using these indexes: MySQL provides a lot of functionality out of the box that starts to resemble what you’d expect from a modern text parsing / NLP library.

The main text searching keyword set in MySQL is MATCH()... AGAINST(). You can choose from a few different search methods, including:

  • Natural language search – no special operators, uses the built-in stopwords that you can customize. This is the default search type.
  • Boolean search – uses a special query language that’s roughly analogous to RegEx (but very different).
  • Query expansion search – runs a regular natural language search, then another one using words from the returned rows, hence the “expansion.”

Each of these methods has tradeoffs, none are one-size-fits-all.

There are other types of indexes in MySQL, like prefix indexes or descending indexes. For more complete info, check out the MySQL docs on indexes.

Creating indexes in MySQL

MySQL uses pretty standard syntax for creating indexes:

CREATE [type] INDEX index_name ON table_name (column_name)

Though the command can be simple, the amount of customization available is pretty impressive. Here’s the full roster of available options from the MySQL docs:

CREATE [UNIQUE | FULLTEXT | SPATIAL] INDEX index_name

    [index_type]

    ON tbl_name (key_part,...)

    [index_option]

    [algorithm_option | lock_option] ...

key_part: {col_name [(length)] | (expr)} [ASC | DESC]

index_option: {

    KEY_BLOCK_SIZE [=] value

  | index_type

  | WITH PARSER parser_name

  | COMMENT 'string'

  | {VISIBLE | INVISIBLE}

  | ENGINE_ATTRIBUTE [=] 'string'

  | SECONDARY_ENGINE_ATTRIBUTE [=] 'string'

}

index_type:

    USING {BTREE | HASH}

algorithm_option:

    ALGORITHM [=] {DEFAULT | INPLACE | COPY}

lock_option:

    LOCK [=] {DEFAULT | NONE | SHARED | EXCLUSIVE}

For example, if we wanted to create an index on the “most recent activity” column in our users table, we’d use the following options. Keep in mind that this column contains non-unique values, so we’ll skip the UNIQUE keyword when creating the index.

CREATE INDEX users_most_recent_activity

ON users ({most_recent_activity} DESC)

COMMENT ‘for querying by most recent activity’

This is on the simpler side. You can customize any of the following:

  • Type of index – as discussed above. You can select non-unique (no keyword), unique, fulltext, etc.
  • Index type – stored as a binary tree or hash.
  • Misc. options – key block size, special parsers, comments, etc.
  • Algorithms and locks used

To go deeper, check out our videos on indexes in MySQL:

  • 1 — Large read queries in production aren’t always common, but you can imagine that this comes up a lot in analytics.

  • 2 — Well, technically ordered in the order in which you inserted them into the table.