惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
I
InfoQ
The Cloudflare Blog
人人都是产品经理
人人都是产品经理
博客园 - Franky
T
Tailwind CSS Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
量子位
博客园_首页
罗磊的独立博客
V
V2EX
李成银的技术随笔
大猫的无限游戏
大猫的无限游戏
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
True Tiger Recordings
Vercel News
Vercel News
Cyberwarzone
Cyberwarzone
Cisco Talos Blog
Cisco Talos Blog
F
Fox-IT International blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
M
Microsoft Research Blog - Microsoft Research
Know Your Adversary
Know Your Adversary
爱范儿
爱范儿
The Register - Security
The Register - Security
G
Google Developers Blog
The Hacker News
The Hacker News
Malwarebytes
Malwarebytes
S
Securelist
博客园 - 三生石上(FineUI控件)
Jina AI
Jina AI
T
Threat Research - Cisco Blogs
T
The Exploit Database - CXSecurity.com
S
SegmentFault 最新的问题
博客园 - 叶小钗
F
Fortinet All Blogs
Apple Machine Learning Research
Apple Machine Learning Research
宝玉的分享
宝玉的分享
博客园 - 聂微东
T
Threatpost
博客园 - 【当耐特】
D
Docker
P
Privacy & Cybersecurity Law Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
G
GRAHAM CLULEY
V
Visual Studio Blog
C
Cisco Blogs
IT之家
IT之家
S
Security Archives - TechRepublic
Latest news
Latest news
阮一峰的网络日志
阮一峰的网络日志

Mox的笔记库

细嗦下MLIR的环境搭建 | Mox的笔记库 博客重构:从Hexo到Astro | Mox的笔记库 2026PPoPP MLIR Tutorial学习 | Mox的笔记库 MacOS配置《明日方舟:终末地》 | Mox的笔记库 2025:向内生长 | Mox的笔记库 由mlir::ExecutionEngine引发的跨系统问题 | Mox的笔记库 WSL2配置Cuda-Tile环境记录(未完待续) | Mox的笔记库 Vibe Coding手搓项目记录 | Mox的笔记库 给Debian上包——以DuckDB为例 | Mox的笔记库 UCPD.sys事件存档 | Mox的笔记库 换新电脑之Mac mini M4从购买到配置 | Mox的笔记库 Mac配置MLX-C开发环境 | Mox的笔记库 RISC-V meets RDBMS——RISC-V架构上可运行数据库一览 | Mox的笔记库 DuckDB Sort实现调查 | Mox的笔记库 修复Redis在树莓派5上无法运行的问题 | Mox的笔记库 如何在MLIR中自定义类型并且输出运行 | Mox的笔记库 网站网络结构变更记录 | Mox的笔记库 EDBT25论文阅读:PhoebeDB——A Disk-Based RDBMS Kernel for High-Performance and Cost-Effective OLTP SIGMOD25论文阅读:BPF-DB:——A Kernel-Embedded Transactional Database Management System For eBPF Applications SIGMOD24文章阅读:Query Compilation Without Regrets | Mox的笔记库 论文阅读:Designing an Open Framework for Query Optimization and Compilation Apache Arrow Gandiva项目解析 | Mox的笔记库 VLDB24论文阅读:Cloud-Native Database Systems and Unikernels——Reimagining OS Abstractions for Modern Hardware NoisePage源码分析(未完待续) | Mox的笔记库 VLDB20论文阅读:Mainlining Databases——Supporting Fast Transactional Workloads on Universal Columnar Data File Formats VLDB17论文阅读:Relaxed Operator Fusion for In-Memory Databases:Making Compilation, Vectorization, and Prefetching Work Together At Last 论文阅读:How not to structure your database-backed web applications——a study of performance bugs in the wild 文章阅读:First Past the Post-Evaluating Query Optimization in MongoDB SIGMOD文章阅读:Apache Calcite——A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources VLDB23论文阅读:Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server SIGMOD22论文阅读:Efficient Massively Parallel Join Optimization for Large Queries VLDB论文阅读:Weaving Relations for Cache Performance VLDB22论文阅读:ConnectorX——Accelerating Data Loading From Databases to Dataframes 论文阅读:UniKraft-Fast, Specialized Unikernels the Easy Way 当DuckDB遇上RISC-V | Mox的笔记库 SIGMOD25论文阅读:An Elephant Under The Microscope——Analyzing The Interaction Of Optimizer Components In PostgreSQL 论文阅读:Compile-Time Analysis of Compiler Frameworks for Query Compilation VLDB23阅读:Bringing Compiling Databases to RISC Architectures LingoDB源码编译与分析 | Mox的笔记库 淦!MLIR输出Hello World不应该这么难! | Mox的笔记库 如何愉快的运行一个MLIR程序 | Mox的笔记库 2024:拥挤年代的想象与创造 | Mox的笔记库 如何给自己的博客添加MLIR和LLVM IR语法高亮 | Mox的笔记库 VLDB19-Parsing Gigabytes of JSON per Second论文阅读 CIDR25:Runtime-Extensible Parsers阅读 | Mox的笔记库 MLIR学习资料整理 | Mox的笔记库 SIGMOD24文章阅读:VeriTxn | Mox的笔记库 VLDB23文章阅读——Exploiting Cloud Object Storage for High-Performance Analytics VLDB24——OLAP on Modern Chiplet-Based Processors走马观花阅读 VLDB22:YeSQL文章阅读(已废弃) | Mox的笔记库 如何让数据库中的Python跑的更快-VLDB22-YeSQL文章阅读 | Mox的笔记库 你好,世界! | Mox的笔记库 让系统研究更有意义:HarmonyOS NEXT的教训和经验——讲座回顾 | Mox的笔记库 UNSW 24T3 COMP9336上课记录 | Mox的笔记库 Velox开发环境配置踩坑记录 | Mox的笔记库 MLIR Toy Tutorial实践记录 | Mox的笔记库 论文阅读:Declarative Sub-Operators for Universal Data Processing LLVM-Kaleidoscope实操踩坑记录 | Mox的笔记库 2024年7月RSSHub开发体验 | Mox的笔记库 澳洲大学计算机硕士比较 | Mox的笔记库 论文阅读——CDUL:CLIP-Driven Unsupervised Learning for Multi-Label Image Classification 论批量快速添加图片与视频水印的事 | Mox的笔记库 CVPR2023-CLIP算法调研 | Mox的笔记库 基于元信息写入的服务器压力测试 | Mox的笔记库 MjAyMw==,希望,前进与平庸之道 | Mox的笔记库 家庭组网IPv6+Mesh折腾 | Mox的笔记库 code-server初体验 | Mox的笔记库 从Nginx到Caddy | Mox的笔记库 Hexo部署安装全流程回顾 | Mox的笔记库 RMM观察与初探 | Mox的笔记库 计算机网络课设——UDP/TCP/TLS Socket实验 | Mox的笔记库 JQuery的XSS初探 | Mox的笔记库 生产实习记录 | Mox的笔记库 Fedora-CoreOS配置与试用(2023年) | Mox的笔记库 Electron学习笔记 | Mox的笔记库 ServerSentEvent学习 | Mox的笔记库 报告翻译:容器云的安全挑战 | Mox的笔记库 Arch Linux迁移计划 | Mox的笔记库 Vagrant配置Metarget靶场环境 | Mox的笔记库 OpenAI-whisper折腾 | Mox的笔记库 202202,困惑,混乱与未曾设想之路 | Mox的笔记库 2022年Hack the box:Tier1免费区全解 | Mox的笔记库 Navidrome部署记录 | Mox的笔记库 长安杯2021-snake复现 | Mox的笔记库 报告概要翻译:OBFUSCATING C++ PROGRAMS VIA CONTROL FLOW FLATTENING 从零开始的Django CVE-2022-28346复现 | Mox的笔记库 2022CISCN(西北区赛)-The shinning | Mox的笔记库 Docker+QEMU+Arm64(Ubuntu)+环境配置(2022版) | Mox的笔记库 Arch Linux运行树莓派系统(2022年) | Mox的笔记库 2022CISCN初赛-ez_usb-复盘WriteUp | Mox的笔记库 NodeMCU-MicroPython配置实录 | Mox的笔记库 Django事务使用 | Mox的笔记库 记录第一次EduSRC上报 | Mox的笔记库 Jetbrain问题应急处理 | Mox的笔记库 Celery5.2学习&配置 | Mox的笔记库 Waline部署记录 | Mox的笔记库 2021年12月 Vivo千镜杯回顾 | Mox的笔记库 Frida hook初次实战 | Mox的笔记库 Log4j2漏洞复现 | Mox的笔记库 Windows的WSL2+Docker初探 | Mox的笔记库
SIGMOD24阅读:ROME——Robust Query Optimization via Parallel Multi-Plan Execution
2025-05-06 · via Mox的笔记库

这篇文章也是选自CMU 15799的Lecture #22,从Introduction来看,也是通过在同一时间运行多种计划,从而找到最优计划的Query Optimize方案

文章的Github仓库:https://github.com/Ultra-Seven/ROME

CMU提供的论文下载地址:https://15799.courses.cs.cmu.edu/spring2025/papers/23-mongodb/wei-sigmod2024.pdf

Introduction

Computer systems feature an increasing degree of parallelism (due, in part, to fundamental limitations in increasing the clock speed of single CPUs). Database systems typically exploit this parallelism to execute more queries or to process more data in parallel.

对的(指某些数据库的Query还是单线程运行),目前还很难见到并行的Query Optimize

However, in practice, query optimization is hard as it relies on accurate cost predictions for alternative plan candidates. Cost estimation may fail for various reasons, including data skew and correlations that make it hard to predict sizes of intermediate results.

这可太对了,这也是Query Optimize即使从上个世纪末人们就开始研究,直到今天依然值得研究的原因。

If at least one of the corresponding plans is good, query optimization will terminate quite quickly.

如果至少一个相应的计划很好,则查询优化将很快终止 …… 有意思

A popular line of research explores the potential of machine learning to make cost estimation more reliable [24, 30, 31]. However, such methods suffer from a cold-start problem in the case of a new database or in case of a dynamic query workload.

机器学习算法的会遇到冷启动问题?Mark下

In summary, the original scientific contributions in this paper are the following:

• We propose a non-intrusive optimization framework, executing complementary query plans in parallel.

• We propose multiple utility models and multiple algorithms for selecting good sets of query plans.

• We analyze the problem of multi-plan selection and the proposed methods formally.

• We compare the proposed method to multiple baselines in experiments, using multiple benchmarks on real data.

OK, It’s good

ROME FRAMEWORK

Figure 1 shows an overview of our approach, executing complementary plans in parallel to reduce the impact of optimizer mistakes.

image-20250505210353456

After selecting plans, ROME executes those plans in parallel (by issuing the same query multiple times via different sessions, varying the optimizer settings in each session to obtain the selected plan). Once the first plan finishes, ROME terminates all parallel executions.

框架结构很清晰:只要有一个计划最快执行完,其他计划就直接停止,并返回结果。可以,这很合理

However, as we will see in the experiments, this approach pays off if it helps to avoid the occasional (but very expensive) optimizer mistake, misleading the optimizer to select plans with an execution cost far above the optimum.

嗯,避免偶尔(但非常昂贵的)优化器错误

The implementation of ROME, evaluated in the experiments, is a Python framework on top of PostgreSQL.

啊,Python么,测试环境是PG,好吧😉

2.2讲与PG的交互细节(I like it😋)但确实没什么好评论的,就不写了

FORMAL MODEL

这一块列了几个定义

Definition 1 (Query Plan).

We model query plans as join trees in the following. For most of this paper, we implicitly assume binary join trees (i.e., the system uses binary join operators). However, the approach is not limited to this scenario and could support non-binary join trees (e.g., if the underlying system uses multiway joins) as well.

嗯,查询计划是树,且并不一定要是二叉连接树,OK

Definition 2 (Intermediate Result).

We associate inner nodes within join trees representing query plans with intermediate results. For a plan p, we denote by I (p) (to save space, we also use the notation Ip ) the set of intermediate results. Different plans may share intermediate results. If so, an estimation error for one intermediate result may lead to incorrect cost estimates for multiple plans.

中间结果也OK,优化器存在多个阶段必然有中间结果

Definition 3 (Size Estimates)

During part of the paper, we assume that the underlying database engine features a cost-based query optimizer. To perform cost-based query optimization, systems generally need to estimate the sizes of intermediate results (based on data statistics, typically). ROME obtains size estimates from the underlying optimizer. For an intermediate result i, we denote by M (i) the size estimate by the optimizer size model.

感觉这个Size Estimates基本可以当作Cardinality Estimate

Definition 4 (Multiplan Selection).

Given a set of plan candidates P, a maximal number n of plans to execute, and a utility function U : P ↦→ R, mapping plan sets to real-valued utility values, the goal of multiplan selection is to select a set P∗ ⊆ P of plans with |P∗| ≤ n that maximizes utility among all plan sets with at most n plans.

MAXIMIZING PLAN DIVERSITY

Definition 5 (Plan Diversity). Given a set P of alternative plans for the same query, we denote by U (P) = | ∪p∈P Ip | the (structural) plan diversity.

Emmm,怎么还是定义😅

Lemma 1. Each unique intermediate result appears in n · l/u plans in average.

……Hence, the average number of occurrences is the average number of plans in which the same result appears.

不管引理是啥,记住后面这句就行了

image-20250505220700928

有意思,直观展现了为什么会有不同的Intermediate Result

Theorem 2. Plan diversity is non-negative and monotone.

查询计划肯定大于0,那可不就只能单调增长么😂

image-20250505221411939

Consider Algorithm 1. This algorithm selects plans greedily for execution. The input is a set of plans, together with the maximal number of plans that can be executed concurrently. Starting from an empty set, the algorithm iteratively adds the plan that maximizes utility in each iteration (here, utility represents plan diversity). Iterations continue until the maximal number of plans has been selected (or no candidates remain to be selected). Also, the algorithm terminates if no plan increases utility. This termination criterion is reasonable. Due to submodularity, adding a plan to supersets of the currently selected plans cannot yield a higher increase in utility. If no plan achieves an improvement in utility, no plan can increase utility in later iterations either.

Emmm,通过贪心找到最优计划,看起来也OK

MINIMIZING EXPECTED COSTS

For base tables, we assume that cardinality is known (even after applying filter predicates). This reflects the fact that incorrect cardinality estimates are often due to correlations between different predicates [6, 27]

“cardinality is known”甚至不要求是对的……😂行吧

Definition 7 (Expected cost).

The expected (execution) cost of a plan set P is defined as the expected value of the minimal execution costs over all plans. More precisely, assume that the execution cost of plan p ∈ P is modeled as random variable Cp , we judge plan sets by the quantity E(minp ∈P Cp ) (where E denotes expected value). Note that cost variables of different plans are not necessarily independent as plans may share the same intermediate results (meaning that an estimation error in one intermediate result affects multiple plans).

有意思:预期是最低cost,但同一个计划集合可能会共享cost变量,所以并不能保证结果准确

Algorithm 2 shows the corresponding pseudo-code. As input, it obtains a set of plans to execute in parallel (P), as well as approximated size distributions for a subset of intermediate results. The output is the expected cost until the first plan in P finishes execution.

image-20250505225057042

好长的算法😂占了快半页

Finally, Algorithm 2 calculates the expected value. To do so, it iterates over possible size combinations. The expression in Line 29 calculates the execution cost of each plan, given currently assumed sizes (S [i] denotes the size assigned by S to intermediate result i).

虽然没有完全看懂,但对算法的描述,这一块补充的很详细,这就很舒服

Definition 8 (Cost Savings).

Denote by Cost (P) the expected minimal plan execution cost for plans P, calculated by Algorithm 2. We extend the definition of that function by assigning Cost (∅) to a high constant (higher than the execution cost of any plan). Then we can define cost savings Save (P) of a plan set P as the improvement in execution cost, compared to the empty set: Save (P) = Cost (∅) − Cost (P).

哎,这定义好详细(虽然大家应该都明白Cost Saving是怎么一会事情)

image-20250506095726112

Figure 4 illustrates the variables, as well as the constraints connecting them. For instance, plan variables m1 to m3 are connected by constraints on the number of selected plans. Variables y1 1 to y3 1 and y1 2 to y3 2 are connected by constraints, making sure that only a single plan can finish first in each scenario. Variables z1 and z2 are con

Note: 这里的ILP是指Instruction-Level Parallelism(指令级并行性),ROME 同时执行多个候选计划,就像现代 CPU 在同一时间执行多个指令一样(即利用 ILP)

COMPLEXITY ANALYSIS

Theorem 9. Selecting plans to maximize the number of intermediate results is NP-hard.

Emmm,前面用贪心算法,这里说是NP-Hard,也不意外

这一章就是数学论证,结论放下面,论证详情可在论文中找到

Note: b 应当指Input Bucket

Theorem 10. Selecting plans to minimize expected parallel execution costs is NP-hard.

Theorem 11. The ILP representing plan selection for maximizing diversity has a number of variables in O (xp + xi ) and constraints in O (xi · xp ).

Theorem 12. The ILP representing plan selection for minimal expected costs uses a number of variables and constraints in O (bk · xp ).

Theorem 13. Greedy plan selection via maximizing unique intermediate results has time complexity in O (xp · n · xl ).

Theorem 14. Associating all intermediate results with cardinality distributions requires time in O (b3 · xi ).

Theorem 15. Greedily selecting plans to minimize expected cost has time complexity in O (n2 · xp · bk · xl ).

EXPERIMENTAL EVALUATION

使用JOB数据集,以及一个现实数据集Stack

image-20250506110518888

对于上面标识的补充:

Hence, we experiment with the

original query order (BAO),

the inverse order (BAO-I),

as well as a random query order (BAO-R).

PB is a re-implementation of the Plan Bouquets approach [13].

MPD (maximizing plan diversity)

PM (probabilistic model)

image-20250506111635061

Figure 5 compares the performance of all baselines for JOB and Stack. Compared to PostgreSQL (PG), PB and PB-3 (which performs better among the two) improve performance significantly for Stack, reducing total execution time for both benchmarks by approximately factor 2.

image-20250506111815281

Table 2 provides further details, reporting aggregate speedups for groups of queries, linked by the number of non-equality unary predicates (e.g., LIKE expressions) where selectivity estimation is particularly hard, especially considering correlations between predicates.

这个Table 2的SpeedUp没写单位,应该是指倍数吧

image-20250506112735352

In summary, ILP can be preferable over greedy when executing particularly expensive queries. However, ILP should be combined with the second (more precise) model to leverage its potential for finding optimal solutions.

似乎是贪心更快,但ILP在有更精确的评估模型的情况下效果更好

Compared to an Oracle, selecting the optimal plan within our plan space for each query, ROME performs worse due to two factors.

First, ROME may fail to select the optimal plan for parallel execution.

Second, even if the optimal plan is selected, its performance may worsen due to executing multiple plans in parallel (as opposed to executing the optimal plan alone).

果然会遇到这样的问题——因为是在同一台机器上,并行运行的某些计划可能就不如直接运行单个计划,配图对应Figure 10

image-20250506113419844

To illustrate the reason that our approach improves the selection of query plans, we consider Query 13a of JOB. In Figure 12, we present the query graph for two plans, chosen by our multiplan selection approach (the left plan is, at the same time, the default plan selected by the PostgreSQL optimizer)

The rectangles with numerical labels represent join nodes. Blue rectangles indicate that the intermediate result is shared between plans. Red and green ones represent intermediate results that are unique to the plan in which they appear.

image-20250506113850568

右图是ROME的效果,看起来不错👍

倒是没提到具体哪些文章,更多的讲明ROME做了哪些工作,没做哪些工作

ROME does not aim at improving cost and size estimation. Instead, it relies on existing models and accounts for their inherent unreliability by selecting plans for parallel execution.

……

ROME does not require any prior training and no specialized hardware

……

ROME does not parallelize the execution of a single plan. Instead, it executes alternative plans for the same query in parallel.

评论

Parallel Optimization大伙们想想都觉得是个好主意,但真要落实下去就会面临各种各样的问题:同一时间运行大量计划,在资源有限的情况下这不一定能找到最优计划,在资源消耗上要比传统的Cost Estimate和其方式要高

这篇论文我觉得最成功的点在于,通过理论和试验说明了这种方案的优势和劣势,是一篇很棒的踩坑文章😍