惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V
Vulnerabilities – Threatpost
P
Proofpoint News Feed
The Hacker News
The Hacker News
Know Your Adversary
Know Your Adversary
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Tenable Blog
AWS News Blog
AWS News Blog
S
Securelist
T
Threatpost
C
Cybersecurity and Infrastructure Security Agency CISA
IT之家
IT之家
腾讯CDC
WordPress大学
WordPress大学
Spread Privacy
Spread Privacy
C
Check Point Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Engineering at Meta
Engineering at Meta
Latest news
Latest news
A
About on SuperTechFans
The Register - Security
The Register - Security
L
LINUX DO - 热门话题
T
The Exploit Database - CXSecurity.com
C
Cisco Blogs
T
Tailwind CSS Blog
Simon Willison's Weblog
Simon Willison's Weblog
阮一峰的网络日志
阮一峰的网络日志
MyScale Blog
MyScale Blog
大猫的无限游戏
大猫的无限游戏
T
Tor Project blog
L
Lohrmann on Cybersecurity
G
GRAHAM CLULEY
B
Blog RSS Feed
Scott Helme
Scott Helme
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
NISL@THU
NISL@THU
P
Privacy International News Feed
Security Latest
Security Latest
Recorded Future
Recorded Future
L
LangChain Blog
Cyberwarzone
Cyberwarzone
C
Cyber Attacks, Cyber Crime and Cyber Security
C
CXSECURITY Database RSS Feed - CXSecurity.com
博客园 - 聂微东
Google DeepMind News
Google DeepMind News
Last Week in AI
Last Week in AI
Apple Machine Learning Research
Apple Machine Learning Research
F
Fortinet All Blogs
O
OpenAI News
T
Threat Research - Cisco Blogs
Blog — PlanetScale
Blog — PlanetScale

blag

SQLite prefixes its temp files with `etilqs_` - blag Setsum - order agnostic, additive, subtractive checksum - blag Oldest recorded transaction - blag Replacing a cache service with a database - blag SQLite commits are not durable under default settings - blag PSA: SQLite WAL checksums fail silently and may lose data - blag Rickrolling Turso DB (SQLite rewrite in Rust) - blag Collection of insane and fun facts about SQLite - blag In search of a faster SQLite - blag Galloping Search - blag Building a distributed log using S3 (under 150 lines of Go) - blag Zero Disk Architecture - blag PSA: Most databases do not do checksums by default - blag PSA: SQLite does not do checksums - blag Disaggregated Storage - a brief introduction - blag Why does SQLite (in production) have such a bad rep? - blag SQLite Slaps - blag Now - blag Learning C - blag Snapshot Testing - blag Win: contribution to libSQL (SQLite) codebase - blag Errata in Hekaton MVCC paper - blag Internet is wholesome: MVCC edition - blag It is becoming difficult for me to be productive in Python - blag MongoDB secondary only index - blag Introducing CaskDB – a project to teach you writing a key-value store - blag Recurse Center: Winter Break - blag Recurse Center Day 24: Hacking Go compiler to add a new keyword - blag Recurse Center Day 20: Django v4 upgrade (from v1) - blag Recurse Center Day 19 - blag Recurse Center Day 18 - blag Recurse Center Day 17 - blag Recurse Center Day 16: Open Source - blag Recurse Center Day 15: B Tree Algorithms - blag Recurse Center Day 14: NoSQL Transactions - blag Recurse Center Day 13: Why 'Raft'? - blag Recurse Center Day 12: Isolation Anomalies - blag Recurse Center Day 11: B Tree Insertions - blag Recurse Center Day 10: Learning Distributed Systems - blag Recurse Center Day 9: Papers We Love - blag Recurse Center Day 8: B Tree Fill Factor (Part 2) - blag Recurse Center Day 7: Basics of ncurses - blag Recurse Center Day 6: B Tree Root - blag Recurse Center First Week - blag Recurse Center Day 5: Garbage Collection Algorithms - blag Recurse Center Day 4: B Tree fill factor - blag Recurse Center Day 3: Hammock Driven Development - blag Recurse Center Day 2: BTree Node - blag Recurse Center Day 1: init - blag What I want to do at Recurse Center - blag Accepted to the Recurse Center! - blag Towards Inserting One Billion Rows in SQLite Under A Minute - blag Marshaling Struct with Special Fields to JSON in Golang - blag I ended up adding duplicate records on a unique index in MongoDB - blag Setting up Github Actions for Hugo - blag Moving to Hugo - blag Catching SIGTERM in Python - blag Git/Github fork-pull request-update cycle - blag Using uWSGI with Python 3 - blag When is my Cake Day? - blag Staying Ahead of Amazon, in Amazon Treasure Hunt Contest - blag How I Am Maintaining Multiple Emails For Git On A Same Machine - blag An exploit on Gaana.com gave me access to their entire User Database - blag Flashing Asus-WRT Merlin by XVortex on NetGear NightHawk R7000 - blag Install Windows 8 UEFI on Legacy BIOS with Clover (and Dual boot with Yosemite) - blag Scraping Javascript page using Python - blag Installing Transmission (remote and CLI) client on Raspberry Pi - blag About - blag Projects - blag
How bloom filters made SQLite 10x faster - blag
2024-12-22 · via blag

This is the fascinating story of how researchers used Bloom filters cleverly to make SQLite 10x faster for analytical queries. These are my five-minute notes on the paper SQLite: Past, Present, and Future (2022). I’ll also explain some database internals and how databases implement joins.

Background

SQLite is a B-tree on disk, row-based storage. It internally uses a VM called VDBE to execute queries. It is cross-platform, single-threaded, and runs almost everywhere.

SQLite is a general-purpose database, but it excels at OLTP workloads. However, researchers at Buffalo University in 2015 found that most queries are simple key-value lookups and complicated OLAP queries. So, researchers at the University of Wisconsin-Madison set out to make it faster for analytical queries since the trend is changing. To set a baseline, they used DuckDB. They used the industry standard OLAP benchmark - Star Schema Benchmark (SSB).

SSB contains a bunch of analytical queries called Star Queries, where you have one large fact table and multiple smaller dimension tables. e.g., fact table = customer orders, dimension tables = customer, merchant info, delivery partners, etc.

Here is a sample query with the query plan. Like a typical analytical query, this one contains joins between four tables.

As expected, they found DuckDB to be 30-50x faster than SQLite. Note that they ran DuckDB in single-threaded mode.

Cause

Let’s figure out why SQLite was slow; then, we can determine how to make it faster. SQLite has a compile time option, VDBE_PROFILE, which measures the number of CPU cycles each instruction takes in VDBE. When they reran the benchmarks, they found two opcodes:

What do these opcodes do?

  • SeekRowID - given a rowId, probe the row in B-tree
  • Column - extract a column from a given record

Since the analytical queries mainly contain join operations, let’s understand how they are implemented.

Database Joins

Following are some of the ways databases implement joins:

  • Nested Loop Joins
  • Hash Joins
  • Sort-Merge Join

SQLite does Nested Loop join, which is the simplest of all three. Here is an animation (source) showing how Nested Loop join works:

Here is a sample Python-like pseudocode:

orders: {id, data} = [(1, obj...), (7, ...), (21, ...)] # fact table
customers: {order_id, data} = [(2, obj...), (7, ...), (14, ...)] # dimension table

selected = []
for order in orders:
    for customer in customers:
        # discards data when customer is 2 or 14
        if order.id == customer.order_id:
            selected.append(customer)

Assume you have two tables: orders and customers. The code here depicts joining both tables using the order_id column. The outer loop iterates over every item in the customers table, and for each item, it iterates over every item in the orders table.

For every id it matches, it appends to the result list. The loop operations are equivalent to probing in B-tree, which is very expensive.

Your goal is to reduce the B-tree probes. Now, stop for a minute and consider why this is not good. Can you come up with some ideas where you can make this better?

Join Order

Next, the order of tables in the join operation matters. Here is a simple illustration:

Consider there are three tables: Orders, Customers, Date. Our query matches 20 items in Customers and 10 items in Date. We probe the B-tree whenever a row matches.

  • O, C, D: 10000 * 20 * 200 = 4M
  • O, D, C: 10000 * 10 * 100 = 1M

Just by flipping the order, it reduced to 1M operations! But it is incredibly difficult to come up with an optimized query plan. It is an NP-Hard problem.

Optimization

How do we optimize the join? The other two join algorithms are better than Nested Loop join. However, authors argue that Hash Join takes significant memory, and SQLite mostly runs in memory-constrained environments. Second, adding one more join algorithm would complicate the query planner.

orders = [(1, obj...), (7, ...), (21, ...)]
customers = [(2, obj...), (7, ...), (14, ...)]

selected = []
cache = {2: True, 7: True, 14: True}
for order in orders:
    if order.id in cache:
        # we only check customers at cache hit
        for customer in customers:
            if order.id == customer.order_id:
                selected.append(customer)

Here is one way: before you run both loops, you first build the customer data cache. Then, in the inner loop, first, you check this cache. Only if there is a match do you iterate over the loop.

That’s what the researchers did! They used a Bloom filter, which is very space efficient and fits in a CPU cache line. It was also easy to implement.

They added two opcodes: Filter and FilterAdd. At the start of the join operation, we go over all the rows of dimension tables and set the bits in the Bloom filter which match the query predicate. The opcode is FilterAdd.

During the join operation, we first check if the row exists in the Bloom filter at each stage. If it does, then we do the B-tree probe. This is the Filter opcode.

Results

This is the optimized query plan.

This is the CPU cycle analysis post-optimization. You can see that the large blue bars are almost gone!

The result? SQLite became 7x-10x faster!

The results of this research have been applied to SQLite already and were released in v3.38.0.

tl;dr: Bloom filters were great because: minimal memory overhead, goes well with SQLite’s simple implementation, and worked within existing query engine.


1. I presented this paper at a local Papers We Love group. This blog post is a summary of the talk
2. Yes, there is a typo in the ‘Matters Order’ slide