惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Heimdal Security Blog
小众软件
小众软件
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
罗磊的独立博客
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
A
About on SuperTechFans
宝玉的分享
宝玉的分享
博客园 - 聂微东
月光博客
月光博客
Cyberwarzone
Cyberwarzone
Microsoft Security Blog
Microsoft Security Blog
V
Visual Studio Blog
Project Zero
Project Zero
T
Tor Project blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
博客园 - 叶小钗
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Attack and Defense Labs
Attack and Defense Labs
Spread Privacy
Spread Privacy
Forbes - Security
Forbes - Security
Simon Willison's Weblog
Simon Willison's Weblog
N
Netflix TechBlog - Medium
P
Proofpoint News Feed
Engineering at Meta
Engineering at Meta
Hacker News: Ask HN
Hacker News: Ask HN
I
InfoQ
M
MIT News - Artificial intelligence
AI
AI
博客园 - 三生石上(FineUI控件)
W
WeLiveSecurity
C
Check Point Blog
The Hacker News
The Hacker News
C
Cyber Attacks, Cyber Crime and Cyber Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Tenable Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Cloudflare Blog
Blog — PlanetScale
Blog — PlanetScale
美团技术团队
D
Darknet – Hacking Tools, Hacker News & Cyber Security
GbyAI
GbyAI
Hacker News - Newest:
Hacker News - Newest: "LLM"
腾讯CDC
K
Kaspersky official blog

Blog — PlanetScale

Keeping a Postgres queue healthy — PlanetScale Patterns for Postgres Traffic Control — PlanetScale Graceful degradation in Postgres — PlanetScale High memory usage in Postgres is good, actually — PlanetScale Stripe Projects partnership: Provision PlanetScale Postgres and MySQL databases from the Stripe CLI — PlanetScale Enhanced tagging in Postgres Query Insights — PlanetScale Behind the scenes: How Database Traffic Control works — PlanetScale Introducing Database Traffic Control — PlanetScale Scaling Postgres connections with PgBouncer — PlanetScale Drizzle joins PlanetScale — PlanetScale Video Conferencing with Postgres — PlanetScale Faster PlanetScale Postgres connections with Cloudflare Hyperdrive — PlanetScale Introducing the PlanetScale MCP server — PlanetScale Database Transactions — PlanetScale Automating our changelog with Cursor commands — PlanetScale Postgres 18 is now available — PlanetScale Using MotherDuck with PlanetScale — PlanetScale $50 PlanetScale Metal is GA for Postgres — PlanetScale AI-Powered Postgres index suggestions — PlanetScale $5 PlanetScale is live — PlanetScale Announcing Vitess 23 — PlanetScale $50 PlanetScale Metal — PlanetScale Report on our investigation of the 2025-10-20 incident in AWS us-east-1 — PlanetScale $5 PlanetScale — PlanetScale Benchmarking Postgres 17 vs 18 — PlanetScale Larger than RAM Vector Indexes for Relational Databases — PlanetScale Partnering with Cloudflare to bring you the fastest globally distributed applications — PlanetScale Processes and Threads — PlanetScale PlanetScale for Postgres is now GA — PlanetScale Postgres High Availability with CDC — PlanetScale Announcing Neki — PlanetScale Caching — PlanetScale The principles of extreme fault tolerance — PlanetScale Announcing PlanetScale for Postgres — PlanetScale Benchmarking Postgres — PlanetScale Announcing Vitess 22 — PlanetScale The Real Failure Rate of EBS — PlanetScale IO devices and latency — PlanetScale Announcing PlanetScale Metal — PlanetScale PlanetScale Metal: There’s no replacement for displacement — PlanetScale Upgrading Query Insights to Metal — PlanetScale Automating cherry-picks between OSS and private forks — PlanetScale Database Sharding — PlanetScale Anatomy of a Throttler, part 3 — PlanetScale Introducing sharding on PlanetScale with workflows — PlanetScale Announcing Vitess 21 — PlanetScale Announcing the PlanetScale vectors public beta — PlanetScale Anatomy of a Throttler, part 2 — PlanetScale Instant deploy requests — PlanetScale Anatomy of a Throttler, part 1 — PlanetScale Increase IOPS and throughput with sharding — PlanetScale Tracking index usage with Insights — PlanetScale Faster backups with sharding — PlanetScale Building data pipelines with Vitess — PlanetScale The State of Online Schema Migrations in MySQL — PlanetScale Optimizing aggregation in the Vitess query planner — PlanetScale Dealing with large tables — PlanetScale Announcing Vitess 20 — PlanetScale Self-managed Vitess vs Managed Vitess with PlanetScale — PlanetScale Achieving data consistency with the consistent lookup Vindex — PlanetScale The MySQL adaptive hash index — PlanetScale Introducing global replica credentials — PlanetScale Profiling memory usage in MySQL — PlanetScale Summer 2023: Fuzzing Vitess at PlanetScale — PlanetScale How PlanetScale makes schema changes — PlanetScale Identifying and profiling problematic MySQL queries — PlanetScale The Problem with Using a UUID Primary Key in MySQL — PlanetScale Announcing Vitess 19 — PlanetScale PlanetScale forever — PlanetScale Introducing schema recommendations — PlanetScale Amazon Aurora Pricing: The many surprising costs of running an Aurora database — PlanetScale Three common MySQL database design mistakes — PlanetScale OAuth applications are now available to everyone — PlanetScale Deprecating the Scaler plan — PlanetScale PlanetScale branching vs. Amazon Aurora blue/green deployments — PlanetScale Databases at scale — PlanetScale Considerations for building a database disaster recovery plan — PlanetScale Working with Geospatial Features in MySQL — PlanetScale PlanetScale vs Amazon Aurora replication — PlanetScale Introducing the Vantage and PlanetScale integration — PlanetScale MySQL isolation levels and how they work — PlanetScale Introducing the schemadiff command line tool — PlanetScale $ pscale ping — PlanetScale Announcing foreign key constraints support — PlanetScale The challenges of supporting foreign key constraints — PlanetScale What is HTAP? — PlanetScale Introducing Insights Anomalies — PlanetScale Webhook security: a hands-on guide — PlanetScale MySQL replication: Best practices and considerations — PlanetScale A guide to HTML email with Ruby on Rails and Tailwind CSS — PlanetScale Sharding for cost-effective database management — PlanetScale PlanetScale ranks 188th in Deloitte’s top 500 fastest-growing companies — PlanetScale Announcing the Fivetran integration — PlanetScale Introducing webhooks — PlanetScale What is MySQL replication and when should you use it? — PlanetScale Sync user data between Clerk and a PlanetScale MySQL database — PlanetScale Introducing database reports — PlanetScale Distributed caching systems and MySQL — PlanetScale What is MySQL partitioning? — PlanetScale MySQL High Availability: Connection handling and concurrency — PlanetScale
Ruby on Rails: 3 tips for deleting data at scale — PlanetScale
Mike Coutermarsh · 2022-08-01 · via Blog — PlanetScale

Mike Coutermarsh |

We’ve seen that as Rails applications grow, there are a few common issues that teams run into with deleting data.

In this post, you’ll learn a few strategies you can use to mitigate the risks of cleaning up data on a high scale Rails application.

How Rails deletes associated data

Rails applications at scale generally run into issues when deleting many records at once. This happens most commonly in models with many associations.

The standard way to delete associated data in Rails is to let ActiveRecord handle it via dependent: :destroy. In the following example, when the parent model (author) is deleted, all data in the dependent models will get deleted by ActiveRecord as well.

class Author < ApplicationRecord
  has_many :books, dependent: :destroy
end

class Book < ApplicationRecord
  belongs_to :author
end

The database schema looks like this:

ActiveRecord::Schema[7.1].define(version: 2022_06_06_171750) do
  create_table "authors", force: true do |t|
    t.string   "name"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  create_table "books", force: true do |t|
    t.string   "name"
    t.text     "description"
    t.bigint   "author_id", null: false
    t.datetime "created_at"
    t.datetime "updated_at"
    t.index    ["author_id"], name: "index_books_on_author_id"
  end
end

There is an indexed foreign key, but no foreign key constraint. ActiveRecord is responsible for deleting the data.

Now that we’ve covered the typical way to delete associated data, let’s look at some tips to improve this.

Tip #1: Use ActiveRecord’s destroy_async

As of Rails 6.1, dependent: :destroy_async was added to ActiveRecord. It works similarly to dependent: :destroy, except that it will run the deletion via a background job rather than happening in request.

This protects you from triggering a large number of deletes within a single transaction. As a Rails application grows, it can be very easy to unintentionally delete a parent record and trigger a cascade of thousands of deletions. Having all of this happen within a request can lead to timeouts and added strain on your database.

Replace any usage of Foreign Key ON DELETE CASCADE

Foreign key constraints are used in databases to manage referential integrity between tables. Specifically, developers will use ON DELETE CASCADE to delete all associated records when the parent record is deleted. This is an option some Rails applications will use rather than the standard dependent: :destroy.

This works well when the child data is limited. It becomes a problem when deleting a large number of child records. A simple delete can suddenly turn into a massive operation deleting thousands of records across multiple tables. This results in the users DELETE request taking several seconds to respond or timing out. In the database, this can lead to excessive locking, increase replication lag, and more issues that will have impact on other parts of the application.

We recommend replacing any usage of foreign key constraints with :destroy_async for safer deletes at scale.

Look out for failing validations

One issue to look out for with destroy_async is the risk of validations failing in a child model when deleting data. Since it’s happening asynchronously, the user will be unaware of any errors and the job will end up in your error queue. If any child records have validations on delete, we recommend running them from the parent model. This will stop the deletion from occurring and alert the user of the issue. This is an important area to add test coverage to protect from any regressions.

Tip #2: Understanding delete vs destroy

We have two primary methods for deleting data, delete and destroy, as well as their related delete_all and destroy_all on ActiveRecord relations.

  • destroy — Deletes the record while also triggering the models callbacks
  • delete — Skips the callbacks and deletes the record directly from the database

If you have callbacks setup, then you’ll generally want to always use destroy so that they are called. It’s important though to be aware of all the activity that could be caused by those callbacks, especially when destroying a large number of records. For example, a cron job for cleaning up old data would be better suited for using delete_all to skip callbacks.

Tip #3: Safely mass deleting old data

When there is no use for data anymore, it’s a common practice to archive or delete it.

For large busy tables, deleting a large number of records at once can lock the table and have unintended consequences to the rest of the application. The safe way is to continuously run deletes in small batches.

Here is an example Sidekiq job that can be scheduled by a cron to run once per hour:

# frozen_string_literal: true

class DeleteOldDataJob < BaseJob
  # We only want 1 instance of this job running at a time
  sidekiq_options unique_for: 1.hour, unique_until: :start, queue: :background, retry: false

  def perform(limit: 500)
    # Deletes 500 records
    deleted = Model.where("created_at < ?", 3.months.ago).limit(limit).delete_all

    # If more records to delete, requeue itself and run again
    if deleted == limit
      self.class.perform_async
    end
  end
end

This example is making use of Sidekiq's unique jobs. This protects us from having several of these jobs running concurrently (which could result in deadlocks). If you are using a job system without uniqueness, an alternative is setting up a queue with a concurrency of 1 and running the cleanup job there.

How to test it

This job is a good place to add test coverage to ensure you're deleting the correct data. Here’s an example pattern you can use.

# frozen_string_literal: true

require "test_helper"

class DeleteOldDataJobTest < ActiveJob::TestCase
  test "deletes data over 3 months old" do
    expired = create(:data, minute: 3.months.ago - 1.hour)
    retained = create(:data, minute: 3.months.ago + 1.hour)

    DeleteOldDataJob.new.perform

    assert Data.where(id: expired.id).empty?
    assert Data.where(id: retained.id).exists?
  end

  test "requeues if more to delete" do
    create(:data, minute: 3.months.ago - 1.hour)
    create(:data, minute: 3.months.ago - 1.hour)

    assert_enqueued_sidekiq_jobs(1, only: DeleteOldDataJob) do
      DeleteOldDataJob.new.perform(limit: 1)
    end
  end
end

Rails and PlanetScale

If you do make a mistake while deleting data or making schema changes, PlanetScale offers some solutions. From the dashboard, you have the option to instantly revert a bad schema change without losing any data. We also throw a warning in the dashboard if you're dropping a table that was recently queried, so you can hopefully catch any mistakes before they happen.

To get started with Rails and PlanetScale, check out the Rails quickstart.