
























In a recent Postgres patch authored by Greg Sabino Mullane, Postgres has a new step forward for data integrity: data checksums are now enabled by default.
This appears in the release notes as a fairly minor change but it significantly boosts the defense against one of the sneakiest problems in data management - silent data corruption.
Let’s dive into what this feature is, what the new default means for you, and how it impacts upgrades.
A data checksum is a simple but powerful technique to verify the integrity of data pages stored on disk. It's like a digital fingerprint for every 8KB block of data (a "page") in your database.
If the two values do not match, it means the data page has been altered or corrupted since it was last written. This is important because data corruption can happen silently. By detecting a mismatch, Postgres can immediately raise an error and alert you to a potential problem. Checksums are also an integral part of pgBackRest which uses these checksums to verify backups.
The initdb command in Postgres is the utility used to create a new Postgres database cluster and initializes the data directory where Postgres stores all the permanent data. When you run initdb, it does things like:
template1 and postgresThe syntax often looks something like this:
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
As an end user who uses cloud managed Postgres or even a local tool like Postgres.app, you generally never see the initdb command because it is a one-time administrative setup task.
--data-checksums for initdbIn the past database admins had to manually add the --data-checksums flag when running initdb to enable this feature. If you forgot or didn’t know about this feature, the new cluster was created without these built-in integrity checks.
The default behavior of initdb is now to enable data checksums every time Postgres is initiated.
initdb -D /data/pg14initdb -D /data/pg18This is generally a win for Postgres best practices. Every new database cluster is now automatically equipped with this corruption defense, requiring no extra effort.
--no-data-checksumsYou might have a very specific reason to disable checksums and you can explicitly opt out using the new flag:
initdb --no-data-checksums -D /data/pg18
pg_upgradeWhile the new default is great, it may introduce a compatibility issue for those doing a major version upgrade using the pg_upgrade utility.
pg_upgrade works by connecting an old data directory to a new data directory and a fundamental requirement is that both clusters must have the same checksum setting—either both ON or both OFF.
If you are upgrading an older Postgres cluster that was created before this change, chances are it has checksums disabled and pg_upgrade will fail because the settings mismatch.
In an upgrade pinch, to upgrade a non-checksum-enabled cluster, you can use the new --no-data-checksums flag when initializing the new cluster to make the settings align.
Instead of continuing forever with no data checksums, the better long term solution is to add checksums to your database before the next upgrade. Sadly, there’s really no way to do this without some downtime and a restart. Adding checksums to an existing database can be a slow process with a large database. There’s a pg_checksums utility to help with this which is well documented.
We have helped a few folks with this issue. For larger no-downtime environments, you can add the checkums on a replica machine and then fail over to that.
Postgres checksums are a great feature - and will be the default in the future. If you haven’t used checksums in the past, you may want to start planning now for adding them, especially since a self managed major version upgrade will require a bit of extra thinking.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。