SIGMOD25论文阅读：An Elephant Under The Microscope——Analyzing The Interaction Of Optimizer Components In PostgreSQL

https://dl.acm.org/doi/pdf/10.1145/3709659

论文相关源码（“This Repo is currently under contruction. Not all files/scripts are available, yet.”）：https://github.com/db-tu-dresden/SIGMOD25-PostgreEval

本文提到的工具

PostBOUND : https://github.com/rbergm/PostBOUND

PG_Lab :https://github.com/rbergm/pg_lab

1 Introduction

Although query optimization in relational database systems has been studied in decades of research [31, 35, 39, 53], there is still no consensus on how to do query optimization the “right” way and novel approaches are published each year.

要我说，是否存在Right Way都不一定😂毕竟这可是”No Free Lunch”的系统工程

As the system-under-test (SUT), we use PostgreSQL (PG) as a widely-used and prime representative of the traditional query optimizer architecture.

Nice，主要以PG作为研究对象

2 Preliminaries

2.1 Textbook Query Optimizer

While a large number of relational database systems with different features and for different use cases exists, most of them contain a query optimizer that is loosely based on the textbook optimizer architecture shown in Figure 1.

Most systems use either heuristic search algorithms (e.g. SQLite [56]), exhaustive algorithms (e.g. MySQL [14] or DuckDB [10]), or both (e.g. PG [49])

将Query Optimizer分为三类：

Heuristic Search -> SQLite

Exhaustive Algorithm -> MySQL, DuckDB

以及两者的混合 -> PG

Recently a plethora of approaches that utilize machine learning techniques for cardinality estimation have been proposed

使用机器学习进行cardinality estimation

The first category contains work such as MSCN [27] that learns a global neural network for an entire database schema, and LocalNN [67] that learns multiple, but more lightweight models. LPCE [63] also learns from labelled queries, but integrates the cardinalities in a re-optimization scheme.

即使是机器学习，方案也不止一种

这一小节基本就是报菜名，各类传统方法和机器学习都用上了，五花八门

2.3 Open Questions

Despite the large number of novel research projects to improve the query optimizer, most of these approaches focus on individual components and improve them in isolation

嗯，确实，一旦使用的Optimize方法多起来就不好测量

关于JOB数据集

3 Methodology and Setup

Therefore, we invested a lot of engineering effort to develop two novel frameworks to study the optimizer and the interaction of its components: pg_lab and PostBOUND

pg_lab focuses on the low-level details of the optimization process, e.g. by overwriting the optimizer’s control flow

PostBOUND is used to analyze query plans and to generate input data for the optimizer.

PostBOUND : https://github.com/rbergm/PostBOUND

PG_Lab :https://github.com/rbergm/pg_lab

3.1 Research Frameworks

pg_lab is a research-focused fork of PG that exposes the internal functionality of the query optimizer while keeping the general control-flow intact. This is achieved by adding a number of extension hooks to the optimization process.

呃，PG的分叉，但通过插件Hook住optimization process？感觉不太妙（软件分叉出去一般就不太好回）

但看了下pg_lab的仓库，文档还不错，有些工作必须要修改源码才能做到

pg_lab also ships an extension to embed optimizer decisions directly into the SQL query using a special hint syntax, similar to the widely-used pg_hint_plan extension …… Using hints as the primary means for steering the optimizer abstracts all the low-level details of the optimizer interaction away

在Low-Level层面添加Optimize Hint

As long as the hinting interface remains stable, optimizer prototypes can be tested across PG versions.

“prototypes can be tested across PG versions” 这个好

This framework（PostBOUND） is implemented in Python to ensure ease of programming. At its core, it provides an optimization pipeline that is used to generate input data for the actual database optimizer. This pipeline is composed of multiple interfaces that handle different parts of the optimization process, in particular join ordering, operator selection (scans and joins) and cardinality estimation.

Optimization PipeLine 这听起来很Nice

apart from PG, PostBOUND also supports MySQL

这么说，应该可移植性不错

The high-level interaction between PostBOUND and pg_lab is summarized in Figure 2: The user selects a target workload (e.g. JOB, Stats, or a custom workload) and configures an optimization pipeline by implementing select optimization stages with the desired strategies (e.g. by implementing a novel cardinality estimator). PostBOUND then generates hinted queries that encode the optimization decisions in each query.

3.2 Benchmarks

Specifically, we perform all analysis using JOB [30] (cf. Table 1). In addition, we use the more recent Stats benchmark [18] and for Section 6 the Stack benchmark [35]. Table 2 summarizes important properties of the different benchmarks to highlight similarities and differences.

3.3 Test Environment

没什么特殊的：X86_64芯片的大内存服务器，就是系统环境有调整

We optimize this server’s configuration using the PGTune tool [61], targeting an OLAP workload

使用PG Tune优化系统

we pre-warm the buffer with all relevant base table pages and index pages using the pg_prewarm

提前将数据库的Buffer预热了

4 Cardinality Distortion

4.1 Impact on Plan Quality

JOB数据集的情况：

JOB: Figure 3 shows the execution plans for selected JOB queries chosen by PG’s optimizer along with their runtimes as we vary . We observe three distinct reactions to the distortion factor:

(1) A small percentage of queries always use the same execution plan (0 out of 113 queries on the HDD server, 12 on the SSD system).

(2) The second group (26 queries in total on the HDD server, 79 queries on the SSD server) evolve gradually as d changes. Figure 3a shows an example query. This can be viewed as the desired behavior since the plan is adapted depending on the size of the intermediate results.

(3) The third group exhibits an unstable behavior in the sense that the optimizer switches back and forth between different candidate plans. This group amounts to 87 queries on the HDD server and just 22 queries on the SSD server.

More specifically, the SSD-based system has a lower random_page_cost and a higher effective_cache_size.

SSD与HDD在性能的差异主要来源于random_page_cost与effective_cache_size

Lesson learned: Small-scale improvements of cardinality estimates often do not lead to better execution plans in PG. Even slight estimation errors can result in major slowdowns of the execution time. Overall, there is no clear pattern when the PG optimizer switches to a new candidate plan and these switches have an arbitrary impact on the runtime.

“基数估算的小规模改进通常不会导致PG中更好的执行计划，但即使是轻微的估计错误也可能导致执行时间的重大放缓”，而后半句我没太懂，是在说PG优化器的切换为不同物理计划是随机时间，且这事情本身就会随机影响运行时间？有点怪

4.2 Impact of Cost Model and Plan Search Space

这个图我觉得挺棒的：Estimate-Function的估计模式（需要补充一下关系代数：这里的R是Relation，S是Schema）

Cost Model and Plan Search Space对JOB任务集的影响

这一段的“Lesson Learn”就不放了：我感觉就是没得出结论的意思

5 Analyzing Candidate Plans

那一页开头给了一个随机生成Join Orders的算法，使用的是图的写法

让GPT解释下：

关于Join Orders在JOB的差异

5.1 Correlation of Cost Estimates and Runtime

The authors found that a simplified cost model and executor (i.e. only supporting sequential scan, hash join, and index-nested-loop joins) focused on an in-memory setting can predict the execution time reasonably well as long as the true cardinalities are known

TUM他们发现，简单的Join成本估计中，只要确定内存这个变量就能很好的预测时间

At the same time, they observed a much weaker correlation as soon as the cost model and executor become more complex

即执行复杂时，Cost Model就不能很好起到作用

Similar to the setup of Leis et al., we perform a linear regression to predict the query runtime from the cost estimate.

Ok, fine

5.2 Importance of Base Join Selection

Since the best base join is often the one with the lowest cardinality, it might be interesting to explore optimizer architectures that pay special attention to the selection of the optimal base join in future work.

所以到底是重要还是不重要？

Lesson learned: A lot of fast candidate plans are built using the same base join, which oftentimes is the one with the lowest cardinality. The remainder of the query plan has a much lower impact on the overall execution time. Just as cardinality estimates are not equally important [41], this seems to apply to the plan generation as well

老实说，没看懂😫

6 Non-deterministic Components

Lesson learned: Non-deterministic components introduce random elements to the query optimization that can have a large impact on the query runtime

这一页的开头贴了这一张图，说明是有影响的咯

Lesson learned: Non-deterministic components introduce random elements to the query optimization that can have a large impact on the query runtime.

……“Have a large impact”，是否能理解为作者他们也没搞清楚为什么有大优化？

7 Beyond Textbook Optimizers

贴了一张Misestimation的表，里面提到的UES的优化方案是另外一篇论文的

Figure 12 shows the end-to-end runtime for selected queries depending on the current data shift. What immediately catches the eye are the sudden and sharp increases in query runtime – from now on referred to as jumps.

The main cause for the sudden jumps are thresholds in the data shift: after certain insert steps or before certain delete steps, new rows that match the query’s filter predicates might become available.

出现大变动的原因是：AP类操作导致Query的Filter预测失效，导致单个Operator需要处理的数据量增大

Lesson learned: The optimizer architecture influences how well misestimates can be compensated.

这回是看懂了，Query Optimizer的架构会影响错误估计的补偿效果（这是不是一句正确的废话😂）

8 Conclusion and Future Work

Developing a new optimizer architecture that exploits this fact might be an exciting opportunity for future work.

对的，是会有助于开发新的Optimizer，如果这套能接入Apache Calcite那是最好的

Our results indicate that this is certainly possible, but research is by no means complete. Therefore, future work should consider deviating from the traditional architecture instead of solely focusing on isolated improvements to individual components.

这个长句大伙们细品😂我不太想评价

个人评价

这篇文章数据很枯燥，不少研究没得出个定量的规律结果，个人观感觉得不是很好

那两个软件从介绍来看很有意思，有机会可以看看，CMU的Andy在今年的CMU-15799说没有个好的工具去判断Query-Optimize的效果，这两个软件也许能达到Andy想要的效果😋

推荐订阅源

Mox的笔记库