惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

SecWiki News
SecWiki News
I
InfoQ
The Cloudflare Blog
人人都是产品经理
人人都是产品经理
博客园 - Franky
T
Tailwind CSS Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
量子位
博客园_首页
罗磊的独立博客
V
V2EX
李成银的技术随笔
大猫的无限游戏
大猫的无限游戏
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
True Tiger Recordings
Vercel News
Vercel News
Cyberwarzone
Cyberwarzone
Cisco Talos Blog
Cisco Talos Blog
F
Fox-IT International blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
M
Microsoft Research Blog - Microsoft Research
Know Your Adversary
Know Your Adversary
爱范儿
爱范儿
The Register - Security
The Register - Security
G
Google Developers Blog
The Hacker News
The Hacker News
Malwarebytes
Malwarebytes
S
Securelist
博客园 - 三生石上(FineUI控件)
Jina AI
Jina AI
T
Threat Research - Cisco Blogs
T
The Exploit Database - CXSecurity.com
S
SegmentFault 最新的问题
博客园 - 叶小钗
F
Fortinet All Blogs
Apple Machine Learning Research
Apple Machine Learning Research
宝玉的分享
宝玉的分享
博客园 - 聂微东
T
Threatpost
博客园 - 【当耐特】
D
Docker
P
Privacy & Cybersecurity Law Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
G
GRAHAM CLULEY
V
Visual Studio Blog
C
Cisco Blogs
IT之家
IT之家
S
Security Archives - TechRepublic
Latest news
Latest news
阮一峰的网络日志
阮一峰的网络日志

Mox的笔记库

细嗦下MLIR的环境搭建 | Mox的笔记库 博客重构:从Hexo到Astro | Mox的笔记库 2026PPoPP MLIR Tutorial学习 | Mox的笔记库 MacOS配置《明日方舟:终末地》 | Mox的笔记库 2025:向内生长 | Mox的笔记库 由mlir::ExecutionEngine引发的跨系统问题 | Mox的笔记库 WSL2配置Cuda-Tile环境记录(未完待续) | Mox的笔记库 Vibe Coding手搓项目记录 | Mox的笔记库 给Debian上包——以DuckDB为例 | Mox的笔记库 UCPD.sys事件存档 | Mox的笔记库 换新电脑之Mac mini M4从购买到配置 | Mox的笔记库 Mac配置MLX-C开发环境 | Mox的笔记库 RISC-V meets RDBMS——RISC-V架构上可运行数据库一览 | Mox的笔记库 DuckDB Sort实现调查 | Mox的笔记库 修复Redis在树莓派5上无法运行的问题 | Mox的笔记库 如何在MLIR中自定义类型并且输出运行 | Mox的笔记库 网站网络结构变更记录 | Mox的笔记库 EDBT25论文阅读:PhoebeDB——A Disk-Based RDBMS Kernel for High-Performance and Cost-Effective OLTP SIGMOD25论文阅读:BPF-DB:——A Kernel-Embedded Transactional Database Management System For eBPF Applications SIGMOD24文章阅读:Query Compilation Without Regrets | Mox的笔记库 论文阅读:Designing an Open Framework for Query Optimization and Compilation Apache Arrow Gandiva项目解析 | Mox的笔记库 VLDB24论文阅读:Cloud-Native Database Systems and Unikernels——Reimagining OS Abstractions for Modern Hardware NoisePage源码分析(未完待续) | Mox的笔记库 VLDB20论文阅读:Mainlining Databases——Supporting Fast Transactional Workloads on Universal Columnar Data File Formats VLDB17论文阅读:Relaxed Operator Fusion for In-Memory Databases:Making Compilation, Vectorization, and Prefetching Work Together At Last 论文阅读:How not to structure your database-backed web applications——a study of performance bugs in the wild SIGMOD24阅读:ROME——Robust Query Optimization via Parallel Multi-Plan Execution 文章阅读:First Past the Post-Evaluating Query Optimization in MongoDB SIGMOD文章阅读:Apache Calcite——A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources VLDB23论文阅读:Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server SIGMOD22论文阅读:Efficient Massively Parallel Join Optimization for Large Queries VLDB22论文阅读:ConnectorX——Accelerating Data Loading From Databases to Dataframes 论文阅读:UniKraft-Fast, Specialized Unikernels the Easy Way 当DuckDB遇上RISC-V | Mox的笔记库 SIGMOD25论文阅读:An Elephant Under The Microscope——Analyzing The Interaction Of Optimizer Components In PostgreSQL 论文阅读:Compile-Time Analysis of Compiler Frameworks for Query Compilation VLDB23阅读:Bringing Compiling Databases to RISC Architectures LingoDB源码编译与分析 | Mox的笔记库 淦!MLIR输出Hello World不应该这么难! | Mox的笔记库 如何愉快的运行一个MLIR程序 | Mox的笔记库 2024:拥挤年代的想象与创造 | Mox的笔记库 如何给自己的博客添加MLIR和LLVM IR语法高亮 | Mox的笔记库 VLDB19-Parsing Gigabytes of JSON per Second论文阅读 CIDR25:Runtime-Extensible Parsers阅读 | Mox的笔记库 MLIR学习资料整理 | Mox的笔记库 SIGMOD24文章阅读:VeriTxn | Mox的笔记库 VLDB23文章阅读——Exploiting Cloud Object Storage for High-Performance Analytics VLDB24——OLAP on Modern Chiplet-Based Processors走马观花阅读 VLDB22:YeSQL文章阅读(已废弃) | Mox的笔记库 如何让数据库中的Python跑的更快-VLDB22-YeSQL文章阅读 | Mox的笔记库 你好,世界! | Mox的笔记库 让系统研究更有意义:HarmonyOS NEXT的教训和经验——讲座回顾 | Mox的笔记库 UNSW 24T3 COMP9336上课记录 | Mox的笔记库 Velox开发环境配置踩坑记录 | Mox的笔记库 MLIR Toy Tutorial实践记录 | Mox的笔记库 论文阅读:Declarative Sub-Operators for Universal Data Processing LLVM-Kaleidoscope实操踩坑记录 | Mox的笔记库 2024年7月RSSHub开发体验 | Mox的笔记库 澳洲大学计算机硕士比较 | Mox的笔记库 论文阅读——CDUL:CLIP-Driven Unsupervised Learning for Multi-Label Image Classification 论批量快速添加图片与视频水印的事 | Mox的笔记库 CVPR2023-CLIP算法调研 | Mox的笔记库 基于元信息写入的服务器压力测试 | Mox的笔记库 MjAyMw==,希望,前进与平庸之道 | Mox的笔记库 家庭组网IPv6+Mesh折腾 | Mox的笔记库 code-server初体验 | Mox的笔记库 从Nginx到Caddy | Mox的笔记库 Hexo部署安装全流程回顾 | Mox的笔记库 RMM观察与初探 | Mox的笔记库 计算机网络课设——UDP/TCP/TLS Socket实验 | Mox的笔记库 JQuery的XSS初探 | Mox的笔记库 生产实习记录 | Mox的笔记库 Fedora-CoreOS配置与试用(2023年) | Mox的笔记库 Electron学习笔记 | Mox的笔记库 ServerSentEvent学习 | Mox的笔记库 报告翻译:容器云的安全挑战 | Mox的笔记库 Arch Linux迁移计划 | Mox的笔记库 Vagrant配置Metarget靶场环境 | Mox的笔记库 OpenAI-whisper折腾 | Mox的笔记库 202202,困惑,混乱与未曾设想之路 | Mox的笔记库 2022年Hack the box:Tier1免费区全解 | Mox的笔记库 Navidrome部署记录 | Mox的笔记库 长安杯2021-snake复现 | Mox的笔记库 报告概要翻译:OBFUSCATING C++ PROGRAMS VIA CONTROL FLOW FLATTENING 从零开始的Django CVE-2022-28346复现 | Mox的笔记库 2022CISCN(西北区赛)-The shinning | Mox的笔记库 Docker+QEMU+Arm64(Ubuntu)+环境配置(2022版) | Mox的笔记库 Arch Linux运行树莓派系统(2022年) | Mox的笔记库 2022CISCN初赛-ez_usb-复盘WriteUp | Mox的笔记库 NodeMCU-MicroPython配置实录 | Mox的笔记库 Django事务使用 | Mox的笔记库 记录第一次EduSRC上报 | Mox的笔记库 Jetbrain问题应急处理 | Mox的笔记库 Celery5.2学习&配置 | Mox的笔记库 Waline部署记录 | Mox的笔记库 2021年12月 Vivo千镜杯回顾 | Mox的笔记库 Frida hook初次实战 | Mox的笔记库 Log4j2漏洞复现 | Mox的笔记库 Windows的WSL2+Docker初探 | Mox的笔记库
VLDB论文阅读:Weaving Relations for Cache Performance
2025-04-05 · via Mox的笔记库

这篇文章来自威斯康星麦迪逊的团队,应该也是最早提出PAX存储格式的论文。PAX的全称是Partition Attribute Across,关于PAX存储格式,Andy在上CMU-15445的时候有提到过,如果只是单纯想了解下PAX,那看这门课就够了(PAX相比较于NSM与DSM,是一种适用于HTAP的方案)

image-20250405085243430

另外,OceanBase的MiniOB也实现了一个PAX存储模式的方案,感兴趣想深入的可以看看

image-20250405090151457

论文地址:https://www.vldb.org/conf/2001/P169.pdf

Introduction

The communication between the CPU and the secondary storage (I/O) has been traditionally recognized as the major database performance bottleneck.

这应该也是数据库长期面对的问题——但内存的速度提升与价格降低缓解了这个问题

When running commercial database systems on a modern processor, data requests that miss in the cache hierarchy (i.e., requests for data that are not found in any of the caches and are transferred from main memory) are a key memory bottleneck

缓存无法命中是数据库慢的主要原因

Implementing PAX on a DBMS that uses NSM requires only changes to the page-level data manipulation code.

听起来很Nice,但似乎没人这么做?😂

Research in computer architecture, compilers, and database systems has focused on optimizing data placement for cache performance.

时至今日好像依旧如此?😂

Figure 1 depicts an NSM page after inserting four records of a relation R with three attributes: SSN, name, and age. Each record has a record header (RH) containing a null bitmap, offsets to the variable-length values, and other implementation-specific information.

image-20250405095322197

回顾了下NSM的历史,并且明确其在一部分情况下会造成缓存未命中

image-20250405100040529

回顾了下DSM的历史,这里把列成为sub-relation

Unfortunately, DSM’s performance deteriorates significantly for queries that involve multiple attributes from each participating relation.

这个情况在现在应该不是问题了:Parquet,ORC这些都会增加缓存应对这种情况

PAX

image-20250405100847196

PAX partitions each page into n minipages. It then stores values of the first attribute in the first minipage, values of the second attribute in the second minipage, and so on. The page header at the beginning of each page contains pointers to the beginning of each minipage.

image-20250405101228301

• Fixed-length attribute values are stored in F-minipages. At the end of each F-minipage there is a presence bit vector with one entry per record that denotes null values for nullable attributes.

• Variable-length attribute values are stored in Vminipages. V-minipages are slotted, with pointers to the end of each value. Null values are denoted by null pointers.

定长的与不定长的分开存储

Table 1 summarizes the characteristics of NSM, DSM, and PAX, demonstrating the tradeoff between interrecord spatial locality and record reconstruction cost.

image-20250405101834129

这里补充下Inter-record spatial locality和Low record reconstruction cost

前者是不同记录在物理存储上的间距,影响批量读取效率

后者是还原完整记录的成本,影响还原单条记录效率

System Implementation

In the current implementation, TPC-H tables stored using PAX need 8% less space than those stored using NSM.

要比TPC-H Table的空间占用,跟NSM比PAX肯定有优势,但肯定比不过DSM

Data Manipulation Algorithms

BULK-LOADING AND INSERTIONS

In the case of variable-length attributes, it initially uses a hint (either from the DBMS or the average length from the first few records) as an indication of the average value size, and then uses feedback from the actual size of inserted values to adjust the average value size.

相当于使用元数据对变长的类型进行调整

UPDATES

Updates on variable-length attributes may stretch or shrink the record; page reorganizations may be needed to accomodate the change and the slot table must be updated. …… If the new value is longer than the space available in the V-minipage, the V-minipage borrows space from a neighboring minipage

没太理解,似乎Update需要先读取出来在写回去?

DELETIONS

The NSM deletion algorithm uses the slot array to mark deleted records, and the free space can be filled upon future insertions.

使用Bitmap标记删除数据,并适当重组小型内容(reorganizes minipage contents)以减少碎片化

Query Operators

A scan operator that supports sargable predicates [28] was implemented on top of Shore.

支持Scan下推

The adaptive dynamic hash join algorithm [23], which is also used in DB2 [20], was implemented on top of Shore.

似乎用了和DB2差不多的Join方案

Analysis of the impact of data placement

Dell 6400 PII Xeon/MT system running Windows NT 4.0. This computer features a Pentium II Xeon processor running at 400MHz, 512MB of main memory, and a 100MHz system bus. The processor has split 16-KB first-level (L1) data and instruction caches and a unified 512-KB second-level (L2) cache.

好老的机子😂现在看起来有些不可思议

PAX primarily targets optimizing data cache behavior, and does not affect I/O performance in any way

这里的参数类型没有告知硬盘——在2001年还没有SSD

The graph shows that(Figure 6), while the performance of the NSM and PAX schemes are relatively insensitive to the changes, DSM’s performance is very sensitive to the number of attributes used in the query.

image-20250405105823403

NSM vs. PAX Impact on Cache Behavior

从数据来看效果不错,提高了内存的命中率,降低了CPU使用时间

image-20250405110555250

NSM/PAX Sensitivity Analysis

image-20250405111904214

In these experiments, DSM’s performance is about a factor of nine slower than NSM and PAX. As the number of attributes involved in the query increases, DSM must join the corresponding number of sub-relations.

这部分内容图中没画出来,也就是说,随着查询数量增加,DSM方式会慢上9倍,且主要原因是无法利用上缓存

Evaluation Using DSS Workloads

跑TPC-H,DSS这里是指Decision Support

We conducted experiments on the system described in Section 5.1, using a 128-MB buffer pool and a 64-MB hash join heap.

哎,这Buffer好小😂

image-20250405111954922

Figure 10 compares the elapsed time required to load a 100-MB, 200-MB, and 500-MB TPC-H database with each of the three storage organizations.

也就是SF0.1,SF0.2与SF0.5?emmm😅

Bulk-loading

没调整算法前与NSM相比有2%-10%的性能损失

image-20250405114636880

As an alternative, we implemented a smarter algorithm that reorganizes minipages only if the free space on the page is more than 5%

修改了reogranize minipage的算法后,性能损失可以下降到0.8%

image-20250405114838604

Queries

可以看到性能都有10%左右的提升,但似乎数据量越大性能提升越不明显(我猜应该是能使用的Cache占比变小了)

The speedups obtained, however, are not constant across the experiments due to a combination of differing amounts of I/O and interactions between the hardware and the algorithms being used.

image-20250405115131844

Since query 14 accesses fewer attributes and requires less computation than query 12, PAX outperforms NSM by only 6-32% when running this query

Q14相较于Q12由于访问的属性较少,所以性能提升并不明显

Updates

这我感觉不太好评价,Updates的性能会随着Attribute的增多而下降,而保底能达到10%的提升

image-20250405120052202

When executing updates PAX is always faster than NSM, providing a speedup of 10-16%. The speedup is a function of the fraction of the record accessed as well as the selectivity.

受益于Select加速,Update也能从中获得性能提升

总结

这篇论文发表于2001年,但在2025年的当下,主流的数据库大部分都是DSM,而非PAX,我猜原因可能是随着硬件性能的提升(SSD与DDR5内存),人们更看重存粹的OLAP性能,导致PAX叫好不叫座

而这篇文章更多的强调缓存Buffer对数据的影响。如果Buffer足够大,DSM的性能是否能与PAX持平?

而现代的存储,或多或少由也有收到PAX的影响(比如Apache Parquet),所以很难评价PAX这个方案到底是好还是不好

威斯康星麦迪逊还有另外一篇有关PAX的论文,如果后续有时间会考虑看下