惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
Performant database tree traversal with Rails — PlanetScale
Mike Coutermarsh · 2023-07-13 · via Blog — PlanetScale

Mike Coutermarsh |

We recently solved an interesting performance problem in our Rails API when performing a tree traversal.

We have some code that figures out a 3-way merge for your database schemas. To do this, we need to calculate a merge base between two schemas (like git!).

Our Rails API keeps track of changes to each PlanetScale database schema. We call each of these changes a "schema snapshot," similar to a git commit that stores the state of the schema at a specific time. Each snapshot can have one or two parents. When merging branches, we perform a breadth-first search on the history of each change until we find the common ancestor between both branches. This is the merge base.

An image showing the merge base with two arrows coming out of it: one for alter table teams and another for alter table users. The alter table users has an arrow coming of it for create table payments.

Going one-by-one is slow

Finding the merge base involves checking each snapshot in the history of both branches.

Usually, this is fast because most databases only have a few changes.

Others, though, would have thousands. This is where we'd run into performance issues. Since we were doing a database query for every step as we traversed the tree. Even though each query was fast, the network time alone added up quickly.

select * from schema_snapshots where id = 20
select * from schema_snapshots where id = 21
select * from schema_snapshots where id = 22
select * from schema_snapshots where id = 23
select * from schema_snapshots where id = 24
// *thousands more queries*

Solving the N+1 problem

This is a classic "N+1" performance problem. For each step in the process, we trigger another query.

Usually, fixing these in Rails is quite simple. There is an includes method for preloading the data we need. Unfortunately, in this case, due to the data structure, the normal preloading techniques do not work. We had to invent something new.

In-memory cache

First, we started by creating an in-memory cache for storing the snapshots. This is a temporary cache that only exists to find the merge base. In-memory is important here because our primary performance issue was caused by the sheer number of network requests.

class SnapshotMergeBaseCache
  attr_reader :store, :hits, :misses

  def initialize
    @store = {}
    @misses = 0
    @hits = 0
  end

  def get(id)
    if @store[id]
      @hits += 1
      return @store[id]
    end

    @misses += 1

    add(SchemaSnapshot.find(id))
  end

  def add(node)
    @store[node.id] = node
  end
end

This gives us a place to store all of our snapshots in memory. It uses a simple hash where the id is the key, and the value is the snapshot data.

The add method puts a snapshot into the cache. The get method is for retrieving it. The get method also keeps stats on the hits/misses. We used these stats to understand how well it worked once in production.

Preloading the cache

Now that we have the cache, the next step is bulk-loading it with snapshots. Conveniently, the snapshot history is pretty predictable when finding the merge base. We could preload the X most recent snapshots for each branch and drastically reduce the number of trips back to the database.

cache = SnapshotMergeBaseCache.new

from_branch.recent_snapshots.limit(FROM_PRELOAD_COUNT).each do |snap|
  cache.add(snap)
end
into_branch.recent_snapshots.limit(INTO_PRELOAD_COUNT).each do |snap|
  cache.add(snap)
end

Now, when running our breadth-first search, we can use cache.get(id) to find the next node. It hits the cache in most cases, avoiding the network request and solving our performance problem.

Rolling out & testing

Making changes like this can be tricky. There is often a wide gap between what you expect to happen and the reality of production.

First, we needed to ensure that it was accurate. We ran a few tests where we calculated the merge base using both the old and new methods for thousands of databases. This made us confident the new code was returning the correct results.

We then used feature flags to test rolling out the new code path and recorded data on how it performed. The hits and misses data proved useful for fine-tuning the number of snapshots we preloaded. After a couple iterations, we released it to 100% of our customers.

Alternative solutions

Adding an in-memory cache is just one way of solving this problem. This worked out best for us due to the high number of snapshots we needed to traverse for some databases. It was also simple to layer this solution onto our existing code without many major changes. This reduced the risk when rolling it out.

Database recursive CTE

One option for solving this is letting the database do the work. This can be accomplished with a recursive common table expression. With this, the database could follow the pointer to each record until it finds the common merge base.

Materialized path

The materialized path technique is a way to represent a graph in a SQL database. It stores the relationship history in a single column, such as 20/19/15/10/5/3/1. By doing this, you can then look at the history of two nodes and find their common parent.

This is a great option that works well for tree structures with a known limit. In our case, storing thousands of relationships didn't make this feasible.