SIGMOD24阅读：ROME——Robust Query Optimization via Parallel Multi-Plan Execution

这篇文章也是选自CMU 15799的Lecture #22，从Introduction来看，也是通过在同一时间运行多种计划，从而找到最优计划的Query Optimize方案

文章的Github仓库：https://github.com/Ultra-Seven/ROME

CMU提供的论文下载地址：https://15799.courses.cs.cmu.edu/spring2025/papers/23-mongodb/wei-sigmod2024.pdf

Introduction

Computer systems feature an increasing degree of parallelism (due, in part, to fundamental limitations in increasing the clock speed of single CPUs). Database systems typically exploit this parallelism to execute more queries or to process more data in parallel.

对的（指某些数据库的Query还是单线程运行），目前还很难见到并行的Query Optimize

However, in practice, query optimization is hard as it relies on accurate cost predictions for alternative plan candidates. Cost estimation may fail for various reasons, including data skew and correlations that make it hard to predict sizes of intermediate results.

这可太对了，这也是Query Optimize即使从上个世纪末人们就开始研究，直到今天依然值得研究的原因。

If at least one of the corresponding plans is good, query optimization will terminate quite quickly.

如果至少一个相应的计划很好，则查询优化将很快终止 …… 有意思

A popular line of research explores the potential of machine learning to make cost estimation more reliable [24, 30, 31]. However, such methods suffer from a cold-start problem in the case of a new database or in case of a dynamic query workload.

机器学习算法的会遇到冷启动问题？Mark下

In summary, the original scientific contributions in this paper are the following:

• We propose a non-intrusive optimization framework, executing complementary query plans in parallel.

• We propose multiple utility models and multiple algorithms for selecting good sets of query plans.

• We analyze the problem of multi-plan selection and the proposed methods formally.

• We compare the proposed method to multiple baselines in experiments, using multiple benchmarks on real data.

OK, It’s good

ROME FRAMEWORK

Figure 1 shows an overview of our approach, executing complementary plans in parallel to reduce the impact of optimizer mistakes.

After selecting plans, ROME executes those plans in parallel (by issuing the same query multiple times via different sessions, varying the optimizer settings in each session to obtain the selected plan). Once the first plan finishes, ROME terminates all parallel executions.

框架结构很清晰：只要有一个计划最快执行完，其他计划就直接停止，并返回结果。可以，这很合理

However, as we will see in the experiments, this approach pays off if it helps to avoid the occasional (but very expensive) optimizer mistake, misleading the optimizer to select plans with an execution cost far above the optimum.

嗯，避免偶尔（但非常昂贵的）优化器错误

The implementation of ROME, evaluated in the experiments, is a Python framework on top of PostgreSQL.

啊，Python么，测试环境是PG，好吧😉

2.2讲与PG的交互细节（I like it😋）但确实没什么好评论的，就不写了

FORMAL MODEL

这一块列了几个定义

Definition 1 (Query Plan).

We model query plans as join trees in the following. For most of this paper, we implicitly assume binary join trees (i.e., the system uses binary join operators). However, the approach is not limited to this scenario and could support non-binary join trees (e.g., if the underlying system uses multiway joins) as well.

嗯，查询计划是树，且并不一定要是二叉连接树，OK

Definition 2 (Intermediate Result).

We associate inner nodes within join trees representing query plans with intermediate results. For a plan p, we denote by I (p) (to save space, we also use the notation Ip ) the set of intermediate results. Different plans may share intermediate results. If so, an estimation error for one intermediate result may lead to incorrect cost estimates for multiple plans.

中间结果也OK，优化器存在多个阶段必然有中间结果

Definition 3 (Size Estimates)

During part of the paper, we assume that the underlying database engine features a cost-based query optimizer. To perform cost-based query optimization, systems generally need to estimate the sizes of intermediate results (based on data statistics, typically). ROME obtains size estimates from the underlying optimizer. For an intermediate result i, we denote by M (i) the size estimate by the optimizer size model.

感觉这个Size Estimates基本可以当作Cardinality Estimate

Definition 4 (Multiplan Selection).

Given a set of plan candidates P, a maximal number n of plans to execute, and a utility function U : P ↦→ R, mapping plan sets to real-valued utility values, the goal of multiplan selection is to select a set P∗ ⊆ P of plans with |P∗| ≤ n that maximizes utility among all plan sets with at most n plans.

MAXIMIZING PLAN DIVERSITY

Definition 5 (Plan Diversity). Given a set P of alternative plans for the same query, we denote by U (P) = | ∪p∈P Ip | the (structural) plan diversity.

Emmm，怎么还是定义😅

Lemma 1. Each unique intermediate result appears in n · l/u plans in average.

……Hence, the average number of occurrences is the average number of plans in which the same result appears.

不管引理是啥，记住后面这句就行了

有意思，直观展现了为什么会有不同的Intermediate Result

Theorem 2. Plan diversity is non-negative and monotone.

查询计划肯定大于0，那可不就只能单调增长么😂

Consider Algorithm 1. This algorithm selects plans greedily for execution. The input is a set of plans, together with the maximal number of plans that can be executed concurrently. Starting from an empty set, the algorithm iteratively adds the plan that maximizes utility in each iteration (here, utility represents plan diversity). Iterations continue until the maximal number of plans has been selected (or no candidates remain to be selected). Also, the algorithm terminates if no plan increases utility. This termination criterion is reasonable. Due to submodularity, adding a plan to supersets of the currently selected plans cannot yield a higher increase in utility. If no plan achieves an improvement in utility, no plan can increase utility in later iterations either.

Emmm，通过贪心找到最优计划，看起来也OK

MINIMIZING EXPECTED COSTS

For base tables, we assume that cardinality is known (even after applying filter predicates). This reflects the fact that incorrect cardinality estimates are often due to correlations between different predicates [6, 27]

“cardinality is known”甚至不要求是对的……😂行吧

Definition 7 (Expected cost).

The expected (execution) cost of a plan set P is defined as the expected value of the minimal execution costs over all plans. More precisely, assume that the execution cost of plan p ∈ P is modeled as random variable Cp , we judge plan sets by the quantity E(minp ∈P Cp ) (where E denotes expected value). Note that cost variables of different plans are not necessarily independent as plans may share the same intermediate results (meaning that an estimation error in one intermediate result affects multiple plans).

有意思：预期是最低cost，但同一个计划集合可能会共享cost变量，所以并不能保证结果准确

Algorithm 2 shows the corresponding pseudo-code. As input, it obtains a set of plans to execute in parallel (P), as well as approximated size distributions for a subset of intermediate results. The output is the expected cost until the first plan in P finishes execution.

好长的算法😂占了快半页

Finally, Algorithm 2 calculates the expected value. To do so, it iterates over possible size combinations. The expression in Line 29 calculates the execution cost of each plan, given currently assumed sizes (S [i] denotes the size assigned by S to intermediate result i).

虽然没有完全看懂，但对算法的描述，这一块补充的很详细，这就很舒服

Definition 8 (Cost Savings).

Denote by Cost (P) the expected minimal plan execution cost for plans P, calculated by Algorithm 2. We extend the definition of that function by assigning Cost (∅) to a high constant (higher than the execution cost of any plan). Then we can define cost savings Save (P) of a plan set P as the improvement in execution cost, compared to the empty set: Save (P) = Cost (∅) − Cost (P).

哎，这定义好详细（虽然大家应该都明白Cost Saving是怎么一会事情）

Figure 4 illustrates the variables, as well as the constraints connecting them. For instance, plan variables m1 to m3 are connected by constraints on the number of selected plans. Variables y1 1 to y3 1 and y1 2 to y3 2 are connected by constraints, making sure that only a single plan can finish first in each scenario. Variables z1 and z2 are con

Note: 这里的ILP是指Instruction-Level Parallelism（指令级并行性），ROME 同时执行多个候选计划，就像现代 CPU 在同一时间执行多个指令一样（即利用 ILP）

COMPLEXITY ANALYSIS

Theorem 9. Selecting plans to maximize the number of intermediate results is NP-hard.

Emmm，前面用贪心算法，这里说是NP-Hard，也不意外

这一章就是数学论证，结论放下面，论证详情可在论文中找到

Note: b 应当指Input Bucket

Theorem 10. Selecting plans to minimize expected parallel execution costs is NP-hard.

Theorem 11. The ILP representing plan selection for maximizing diversity has a number of variables in O (xp + xi ) and constraints in O (xi · xp ).

Theorem 12. The ILP representing plan selection for minimal expected costs uses a number of variables and constraints in O (bk · xp ).

Theorem 13. Greedy plan selection via maximizing unique intermediate results has time complexity in O (xp · n · xl ).

Theorem 14. Associating all intermediate results with cardinality distributions requires time in O (b3 · xi ).

Theorem 15. Greedily selecting plans to minimize expected cost has time complexity in O (n2 · xp · bk · xl ).

EXPERIMENTAL EVALUATION

使用JOB数据集，以及一个现实数据集Stack

对于上面标识的补充：

Hence, we experiment with the

original query order (BAO),

the inverse order (BAO-I),

as well as a random query order (BAO-R).

PB is a re-implementation of the Plan Bouquets approach [13].

MPD (maximizing plan diversity)

PM (probabilistic model)

Figure 5 compares the performance of all baselines for JOB and Stack. Compared to PostgreSQL (PG), PB and PB-3 (which performs better among the two) improve performance significantly for Stack, reducing total execution time for both benchmarks by approximately factor 2.

Table 2 provides further details, reporting aggregate speedups for groups of queries, linked by the number of non-equality unary predicates (e.g., LIKE expressions) where selectivity estimation is particularly hard, especially considering correlations between predicates.

这个Table 2的SpeedUp没写单位，应该是指倍数吧

In summary, ILP can be preferable over greedy when executing particularly expensive queries. However, ILP should be combined with the second (more precise) model to leverage its potential for finding optimal solutions.

似乎是贪心更快，但ILP在有更精确的评估模型的情况下效果更好

Compared to an Oracle, selecting the optimal plan within our plan space for each query, ROME performs worse due to two factors.

First, ROME may fail to select the optimal plan for parallel execution.

Second, even if the optimal plan is selected, its performance may worsen due to executing multiple plans in parallel (as opposed to executing the optimal plan alone).

果然会遇到这样的问题——因为是在同一台机器上，并行运行的某些计划可能就不如直接运行单个计划，配图对应Figure 10

To illustrate the reason that our approach improves the selection of query plans, we consider Query 13a of JOB. In Figure 12, we present the query graph for two plans, chosen by our multiplan selection approach (the left plan is, at the same time, the default plan selected by the PostgreSQL optimizer)

The rectangles with numerical labels represent join nodes. Blue rectangles indicate that the intermediate result is shared between plans. Red and green ones represent intermediate results that are unique to the plan in which they appear.

右图是ROME的效果，看起来不错👍

倒是没提到具体哪些文章，更多的讲明ROME做了哪些工作，没做哪些工作

ROME does not aim at improving cost and size estimation. Instead, it relies on existing models and accounts for their inherent unreliability by selecting plans for parallel execution.

……

ROME does not require any prior training and no specialized hardware

……

ROME does not parallelize the execution of a single plan. Instead, it executes alternative plans for the same query in parallel.

Parallel Optimization大伙们想想都觉得是个好主意，但真要落实下去就会面临各种各样的问题：同一时间运行大量计划，在资源有限的情况下这不一定能找到最优计划，在资源消耗上要比传统的Cost Estimate和其方式要高

这篇论文我觉得最成功的点在于，通过理论和试验说明了这种方案的优势和劣势，是一篇很棒的踩坑文章😍

推荐订阅源

Mox的笔记库