惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
What is a query planner? — PlanetScale
Andres Taylor · 2022-12-15 · via Blog — PlanetScale

Andres Taylor |

Anyone that has worked with large databases can testify how slow queries can get. This is often due to the necessary indexes not being there, or something in the query that stops the database system from using the index. Choosing the right indexes to use, and the right order to fetch data in, proves to be the difference between a 10ms and 5s query.

Choosing the indexes and join order is called query planning. The output of this process is a query plan that tells the database system how to answer a query from a user. For simple queries with a single table, it’s often trivial to find the optimal query plan. But for large queries with lots of tables and lots of indexes, the available options can quickly run into the thousands and even millions of alternatives. Most of these alternatives are really slow, so the planner's job is to find the best possible query plan among all possibilities.

How query planning works

Most people are more familiar with compilers than with query planners, so I thought I should compare the work of a query planner with the work of a compiler.

A compiler is a program that takes source code written in a programming language and translates it into machine code that can be executed by a computer's processor. A query planner does something similar. The input is code written in SQL (or some other database query language), and the output is a query plan that describes which indexes will be used, and in which order to access tables.

The typical phases of a compiler/planner are: lexing and parsing, semantic analysis, optimization, and code generation. Let’s look at each of these individually to understand the similarities and the differences between a compiler and a query planner.

Lexing and parsing

In the first phase, lexing and parsing, the source code is analyzed and divided into a sequence of tokens, which are basic units such as keywords, operators, and identifiers. The sequence of tokens generated by the lexical analyzer is analyzed and checked for correctness according to the rules of the programming language. This phase typically involves building a syntax tree, which is a hierarchical representation of the structure of the source code. The output of this step is an abstract syntax tree (AST). There is no interesting difference between a compiler and a planner here.

As an example, let’s look at the following query:

SELECT name, avg(salary) FROM employees JOIN salary_info ON id = empid

The AST would look something like this:

Diagram of the query above as abstract syntax tree

It’s the same query, but instead of a string, it’s now this tree data structure. All the unnecessary parts have been stripped away — the planner doesn’t care if the user wrote “SELECT” or “select”, or any whitespaces in the query.

Semantic analysis

The semantic analysis phase of compilation is where the compiler checks for semantic errors in the input source code. Semantic errors are errors that are not detected during the lexical analysis or syntax analysis phases, but which can only be detected by analyzing the meaning of the source code.

During semantic analysis, the compiler performs a variety of checks to ensure that the source code is semantically correct. For example, the compiler may check for type mismatches, in which a value of one type is used in a context where a value of a different type is expected. The compiler may also check for undefined variables or other entities, such as functions or classes, and may perform additional checks and transformations on the syntax tree generated during syntax analysis.

A query planner does almost exactly the same thing here. Instead of searching for classes and methods, it would bind to tables and columns, but the idea is the same.

After semantic analysis, the data structures representing the query will be enriched with information such as which table a column comes from, what types the columns and expressions in the query have, etc.

Optimization

During the optimization phase, the compiler will now take all the information gathered during parsing and semantic analysis and iteratively change it to a more optimal form. This is often done using an intermediate representation of the query. Instead of staying in a shape that is close to the input language, the intermediate representation is custom made to make optimisations easier and faster to do.

In this step, the query planner uses a variety of algorithms and techniques to determine the most efficient way to execute the query, considering factors such as the available indexes, the data distribution, and the overall structure of the database. This may involve selecting the most efficient algorithms for operations such as joins and sorting, and choosing the most appropriate indexes to use. It usually also does some of the optimizations that a compiler would perform, such as constant folding. These types of optimizations are about rewriting the input into an equivalent form that is easier for the planner to optimize.

An example of this is how the Vitess planner massages predicates into a shape that can be solved using an index. Given a predicate such as:

WHERE (id = 5 AND name = 'Toto') OR (id = 5 AND name = 'Mumin')

The OR in the middle here makes it hard for the planner to use an index on id to find the correct row. The optimizer will rewrite the predicate into something that is easier to optimize but still means the same thing.

WHERE id = 5 AND (name = 'Toto' OR name = 'Mumin')

Let us pause here and talk about why the order of table access is so important. Say we want to join three tables: A with B, and B with C. We could start by joining A with B, and take the output of that and join it with C. Or we can start from the other side — join B with C and then join that result with A. The intermediate state needed is where the big difference comes in. If AxB is very large, joining that with C will be very slow, compared to if we start with BxC that happens to be pretty small. It’s a path finding problem.

Here is a diagram of the tables used in the TPC-H query #8. The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. It’s a well known dataset used to test the strength of database systems, in particular the query planner.

TPC-H query #8 diagram

Source: https://www.sqlite.org/queryplanner-ng.html

All the tables need to be visited, and the connections between tables have different costs. The planner can start at any (*) node. What is the path that touches all tables with the least cost? This is why the join order is so important.

Optimization in Vitess query planner

In Vitess, our query plans are partly executed on the SQL proxy layer, called VTGate, and partly on the individual shards. Probably the most important optimization we perform is to push down as much work as possible to MySQL. If we can perform a join or a filter in MySQL, that is always going to be faster than fetching all the individual rows and performing the same operation on the VTGate side. So, during query planning, we are searching for the query plan that has the least number of network calls.

When planning aggregations, our strategy is to do as much aggregation as possible in MySQL, and then aggregate the aggregates. The planner rewrites the aggregation that the user asked for into smaller aggregations and sends those to MySQL. The results of these queries are then used as inputs and summarized into the final aggregation result. You can read more about grouping and aggregations in an earlier blog post.

Code generation

In the code generation phase, the compiler generates machine code based on the input source code and the analysis performed in previous phases. This machine code can then be executed by the computer's processor.

The query planner generates a plan that specifies the exact steps that the database engine should take to execute the query. This plan may include operations such as index scans, join algorithms, and sorting algorithms, as well as other details such as the order in which the operations should be performed.

The importance of a good query planner

Query planners are an essential component of database management systems, and the work that query planner developers do plays a crucial role for database systems. The field of query planning is an active area of research, with new algorithms and techniques being developed all the time. A good query planner can have a direct impact on the performance and efficiency of databases, which can have real-world benefits for the organizations and users that rely on those databases.