惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

Anže's Blog

The 15-Year-Old iptables Rule That Broke My DNS Fedidevs 9h Outage Postmortem Letting Claude Upgrade My Raspberry Pi Agents Day Lisbon DjangoCon Europe 2026 How to Safely Update Your Dependencies Speeding Up Django Startup Times with Lazy Imports Typing Your Django Project in 2026 Claude Fixes User Bug Jekyll to Hugo Migration Advent of Code 2025 🎄 Migrating Gunicorn to Granian Disable Network Requests When Running Pytest Disable Runserver Warning in Django 5.2 Autogenerating og:images with Jekyll Power Outages and Gunicorn PID Files UV with Django Go-like Error Handling Makes No Sense in JavaScript or Python Packages Do Not Match the Hashes Pip Error Gotchas with SQLite in Production Fedidevs Dev Update #2 Django SQLite Production Config Django Streaming HTTP Responses Deploying a Django Project to My Raspberry Pi (Video) Thoughts on Code Reviews Django SQLite Benchmark Django, SQLite, and the Database Is Locked Error No Downtime Deployments with Gunicorn SQLite Write-Ahead Logging Writing a Pytest Plugin Fedidevs Dev Update #1 Django-TUI: A Text User Interface for Django Commands Automate Hatch Publish with GitHub Actions Words TUI: App for Daily Writing Textual App Auto Reload RDS Blue/Green Deployments Fly.io Certificate Renewal Using Testing Library with Selenium in Python The Fastest Way to Build a Read-only JSON API import __hello__ Enum with `str` or `int` Mixin Breaking Change in Python 3.11 Your Code Doesn't Have to Be Perfect Fixing _SixMetaPathImporter.find_spec() Not Found Warnings in Python 3.10 Upgrading Django App to Python 3.10 Integer Overflow Error in a Python Application Python Dependency Management MySQL Performance Degradation in Django 3.1 New Features in Python 3.8 and 3.9 The Code Review Batch Size The Code Review Bottleneck
Django bulk_update Memory Issue
Anže Pečar · 2025-10-12 · via Anže's Blog

Recently, I had to write a Django migration to update hundreds of thousands of database objects.

Loading the data

With some paper-napkin math I calculated that I can fit all the necessary data in memory, making the migration much simpler than it would have been otherwise.

First I had to make sure to load only the necessary columns. Django’s only queryset method came in very handy:

objs_to_update = TheObject.objects.only("id", "field1", "field2", "field3").all()

Because I generally don’t trust my paper-napkin math, I also made sure to log how much memory each of the steps were using:

process = psutil.Process()
print(process.memory_info().rss // (1024 * 1024))

All the loaded objects were below 2GB, so everything seemed good since the machine had 4GB available.

Updating the objects

With all the objects loaded in memory, it was now time to calculate the new values:

for obj in obj_to_update:
    obj.field3 = compute_new_field3_value(obj)

I was also worried about memory during the update, but according to process.memory_info(), the memory hasn’t increased past 2GB, so it looked like I was on the home stretch.

Saving the results

Calling obj.save(update_only="field3") would have been one option, but it would have taken too long. Luckily, Django has a bulk_update method:

TheObject.objects.bulk_update(objs=objs_to_update, fields=["field3"])

Running this statement as is would have generated a HUGE update statement that my database would not have enjoyed seeing. But luckily, bulk_update has a batch_size parameter that chunks the huge update into multiple smaller ones:

TheObject.objects.bulk_update(objs=objs_to_update, fields=["field3"], batch_size=250)

Unfortunately for me, the way bulk_update works wasn’t what I expected, and it killed my migration with a SIGTERM when I ran it in production. ☠️

Investigating bulk_update

Django first prepares a list of all update clauses, then creates the transaction, and finally executes the updates one by one in a loop.

I measured memory consumption from within bulk_update. After the for loop the memory increased to 4.8GB. The updates list ended up taking an extra 2.8GB. That’s more than all the data I loaded from the database. 800MB more than I had available on the machine, which explained the SIGTERM.

The solution

The solution for this was to implement my own batching and not using Django’s batch_size:

with transaction.atomic():
    for batch in batched(things, 250):
        TheObject.objects.bulk_update(objs=objs_to_update, fields=["field3"])

This makes sure that we only ever have a maximum 250 update statements in memory at a time. I did some measurements again, and only 62MB of additional memory was used during all of this. With this change, my migration finished successfully! 🎉

Reporting the issue to Django

I reported this issue on the Django issue tracker: #36526 bulk_update uses more memory than expected. The ticket received a patch with a solution in a few hours, but unfortunately, the solution got rejected.

There was concern that preparing the update statements within the transaction would prolong it in typical cases and cause all sorts of unintended problems associated with long-running transactions. There is a separate ticket about the performance of building the update statement.

A safer solution was to document the memory usage, which was what ended up being the solution that closed my ticket. batch_update now has the following warning:

When updating a large number of objects, be aware that bulk_update() prepares all of the WHEN clauses for every object across all batches before executing any queries. This can require more memory than expected.

Fin

To me personally the memory leak caused more pain than longer-running transactions ever would, but I understand there are Django projects where the fix would cause issues. There might even be someone out there who relies on extra memory usage to generate more heat for their workflow.