Backup Postgres databases with Kubernetes CronJobs

A key part of operating any safe and reliable system is ensuring that there is a way to recover deleted or lost data in a prompt and consistent way. One key part of that is to maintain automatic backups that are recoverable and verifiable.

This is a quick and easy way to accomplish that goal, by using existing pieces of infrastructure that are common in production networks. There are countless ways to perform a backup, this is simply one of the “easiest” given these ingredients are available.

  ┌───────────┐        ┌─────────────┐   
  │           │        │             │   
  │ Postgres  │        │ Backup S3   │   
  │ Database  │        │ Bucket      │   
  │           │        │             │   
  └───┬───────┘        └──────────▲──┘   
      │                           │      
      │                           │      
      │                           │      
      │                           │      
      │                           │      
   ┌──┼───────────────────────────┼──┐   
   │  │     Kubernetes Cluster    │  │   
   │  │                           │  │   
   │  │  ┌────────────────────┐   │  │   
   │  │  │ Backup-Worker      │   │  │   
   │  └──► (CronJob)          ┼───┘  │   
   │     │                    │      │   
   │     └────────────────────┘      │   
   │                                 │   
   └─────────────────────────────────┘

Assumptions:

You have a Kubernetes cluster
You have a Postgres database
You have access to an S3-like storage repository
Most, or all, of this is in AWS

If these assumptions are not true, it’s still possible to get some value from this, though you’ll have to make some adjustments.

An important caveat: This should only really be done with small-ish databases, where having a complete export is particularly advantageous. If it takes longer than about 15 minutes to run a sqldump, you should consider more advanced techniques to accomplish this.

Also, while there are backup & restore tools inside of RDS, Aurora, and other managed database solutions, I still find it massively valuable to have access to a true SQL dump at times. Your milage may vary, of course.

To get this all working:

It takes a few steps to wire everything up just right:

Build a container with the required tools for the backup
Provision the bucket and lifecycle policy for data retention
Set up service account & permissions to allow the Pod to upload backups
Generate an encryption key to encrypt backups
Configure the backup script
Set up the CronJob to execute the backup on a daily schedule

Security and data integrity will be the foremost priorities throughout this process, so principles of least privilege and encryption at rest are fundamentally important concepts.

Build a custom container image

Whenever working with a container orchestration system like Kubernetes, it’s always best to use purpose-built tools whenever possible. Not only does this reduce image pull times, it also reduces the number of possible vulnerabilities, and therefore the number of times your “security guy” nags you. 😉

This toolbox image, backup-worker has tools for Postgresql and Kafka, as well as AWS utilities.

FROM debian:12-slim
RUN apt update && \
    apt install -y curl gnupg openssl awscli postgresql-client-15 kcat

A basic build task in GitHub Actions can build and push the image:

...
      - name: Docker Login - GitHub Container Repo
        uses: docker/login-action@v1
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Push image
        run: |
          IMAGE_ID=ghcr.io/${{ github.repository }}/backup-worker
          docker tag image $IMAGE_ID:latest
          docker push $IMAGE_ID:latest
          docker tag image $IMAGE_ID:$GITHUB_SHA
          docker push $IMAGE_ID:$GITHUB_SHA

A full config is out of scope for this post, but there are plenty of examples for building and uploading a docker image to the registry of your choice.

Create the required AWS Resources

When creating a simple backup like this, I like to configure the permissions such that the backup process itself cannot delete the backup files. Instead, S3’s lifecycle rules will rotate out old backups, in this case after 90 days.

The importance of this setup is that it prevents a malicious or accidental process within the kubernetes cluster from deleting historic backups. This has long been a key pillar of ransomware attacks, and one that is not possible here by compromising the backup worker process.

More sophisticated rules are also possible with this config, like adding prefix rules for daily/weekly/monthly retention. For simplicity, this will just delete any files older than 90 days:

resource "aws_s3_bucket" "db_backups" {
    bucket = "my-database-backups-bucket"
    tags = local.tags
}

resource "aws_s3_bucket_lifecycle_configuration" "db_backups_lifecycle" {
    bucket = aws_s3_bucket.db_backups.id
    rule {
        id = "1"
        filter {
          prefix = ""
        }
        status = "Enabled"
        expiration {
            days = 90
        }
    }
}

Set up the AWS Permissions

This cluster has IAM Roles for Service Accounts (IRSA) enabled, so mapping individual pods identities to AWS identities is very simple. In this case, the Pod’s ServiceAccount has access to only read & write data to the bucket created in the previous step.

#IAM policy for S3 access - Allows read/write only
resource "aws_iam_policy" "s3_backup_service" {
  name = "s3_backup_service"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
        "s3:GetObject",
        "s3:PutObject"
        ]
        Resource = [ 
            "${aws_s3_bucket.db_backups.arn}/*"
        ] 
      },
      {
        Effect = "Allow"
        Action = [
        "s3:ListBucket"
        ]
        Resource = [
            aws_s3_bucket.db_backups.arn
        ] 
      }
    ]
  })
}

# IAM role for backup-service
resource "aws_iam_role" "s3_backup_service" {
  name        = "s3_backup_service"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Principal = {
          Federated = module.eks.oidc_provider_arn
        }
        Action = "sts:AssumeRoleWithWebIdentity"
        Condition = {
          StringEquals = {
            "${module.eks.oidc_provider}:aud" = "sts.amazonaws.com",
            "${module.eks.oidc_provider}:sub" = "system:serviceaccount:default:backup-service"
          }
        }
      },
    ]
  })
}
resource "aws_iam_role_policy_attachment" "s3_backup_service" {
  role       = aws_iam_role.s3_backup_service.name
  policy_arn = aws_iam_policy.s3_backup_service.arn
}

Note that this references some specific pre-existing pieces of infrastructure. If you have not built your cluster using Terraform, replace module.eks.oidc_provider_arn and module.eks.oidc_provider with the correct values.

The ServiceAccount is also created by Terraform:

# The Service Account in Kubernetes
resource "kubernetes_service_account" "backup-service" {
  metadata {
    name = "backup-service"
    namespace = "default"
    annotations = {
      "eks.amazonaws.com/role-arn" = aws_iam_role.s3_backup_service.arn
    }
  }
}

Secrets

For good security hygiene, secrets should always be kept safe and out of cleartext.

For simplicity, we’ll put all of the configuration in this Secret. In a real cluster, these would likely be pulled from multiple Secrets, and ideally managed by an external secrets provider!

---
apiVersion: v1
kind: Secret
metadata:
  name: backup-secrets
stringData:
    S3_BUCKET:  "my-database-backups-bucket"
    AES_KEY:    "super-secure-aes-key"
    PGHOST:     "example.123456789012.ca-central-1.rds.amazonaws.com"
    PGPORT:     "5432"
    PGUSER:     "backup-role"
    PGPASSWORD: "super-secure-password"

Database Backup script

Note that this script will backup multiple databases on the same server or RDS cluster. Simply replace database1 database2 ... with your database name(s).

As an additional security measure, all backup files are symmetrically encrypted with AES before being uploaded to the bucket.

This script takes two variables as inputs:

S3_BUCKET
AES_KEY

Both are passed in by Secrets via the Pod’s configuration.

#!/bin/bash
set -euo pipefail
trap 'echo "[!] Error executing backup!"' ERR 

AES_KEY_HASH=$(echo -n "${AES_KEY}" | sha256sum)
echo "[+] Encryption key sha256: ${AES_KEY_HASH}"

NOW=$(date +"%Y-%m-%d-%H-%M-%S")
mkdir /backups

echo "[+] Backing up data from server: ${PGHOST}" 

for db in database1 database2 database3; do
    backup_file_path="/backups/${db}_${NOW}.enc"
    echo "[+] Backing up database ${db}..."
    pg_dump ${db} | gzip | \
    gpg --batch -c --passphrase "${AES_KEY}" - > ${backup_file_path}
    if [ ! -s "$backup_file_path" ]; then
        echo "[!] Error backing up. File is 0kb or does not exist"
        exit 1
    fi
    echo "[+] Done."
done

echo "[+] List of backups:"
ls -lash /backups

echo "[+] Uploading backup files to S3 bucket: ${S3_BUCKET}"
aws s3 sync /backups/ s3://${S3_BUCKET}/

The script is put inline inside the ConfigMap manifest, backup-database .

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-database
data:
  backup.sh: |
    #Script in-line here!

The CronJob and Pod config

The backup-worker image from earlier can be used here to execute the backup job using a Kubernetes CronJob:

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
spec:
  schedule: "0 0 * * *"   #Midnight UTC
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 2
  concurrencyPolicy: "Forbid"
  jobTemplate:
    spec:
      backoffLimit: 3
      template:
        spec:
          nodeSelector:
            type: "private"
          containers:
          - name: backup
            image: backup-worker:latest
            command: ["/bin/bash", "/scripts/backup.sh"]
            envFrom:
              - secretRef:
                name: "backup-secrets"
            volumeMounts:
            - name: backup-database
              mountPath: /scripts
          restartPolicy: Never
          serviceAccountName: "backup-service"
          volumes:
          - name: backup-database
            configMap:
              defaultMode: 0644
              name: backup-database

The backup in action

Simply wait until midnight UTC, and the CronJob timer will execute the backup.

Okay, fine, I’m impatient too - Let’s trigger it manually.

kubectl create job --from=cronjob/database-backup database-backup-123456-abcd

Once started, you can follow along with the backup task:

kubectl logs -f database-backup-123456-abcd

The output should look something like this:

[+] Encryption key sha256: 2fb2cfa7bac33a8b02a4b0ce8f85c46feb90f4ea20697c52fb84a855baf9202b  -
[+] Backing up data from server: example.123456789012.ca-central-1.rds.amazonaws.com
[+] Backing up database example...
[+] Done.
[+] List of backups:
total 123M
 16K drwxr-xr-x. 2 root root  16K Feb 19 00:00 .
   0 drwxr-xr-x. 1 root root   70 Feb 19 00:00 ..
123M -rw-r--r--. 1 root root 123M Feb 19 00:00 example_2026-02-19-00-00-01.enc
[+] Uploading backup files to S3 bucket: my-database-backups-bucket
upload: backups/example_2026-02-19-00-00-01.enc to s3://my-database-backups-bucket/example_2026-02-19-00-00-01.enc

Likewise, the file will be present inside the s3 bucket if listed using the awscli tool:

$ aws s3 ls s3://my-database-backups-bucket
...
2026-02-19 19:00:36   12345678 example_2026-02-19-00-00-01.enc

And just like that, future backups will work exactly the same, on a daily schedule.

Bonus: Backup Kafka topic

Using the building blocks laid out above, it’s also quite simple to set up a backup for small Kafka topics. An example script:

Input variables:

S3_BUCKET
KAFKA_BROKERS
AES_KEY

#!/bin/bash
set -euo pipefail
trap 'echo "[!] Error executing backup!"' ERR 

NOW=$(date +"%Y-%m-%d-%H-%M-%S")
mkdir /backups

echo "[+] Backing up topics..." 

TOPICS=(
    "_schemas"
    "Foo.bar.v1"
)

for topic in "${TOPICS[@]}"; do
    echo "[+] Backing up topic ${topic}..."
    backup_file_path="/backups/${topic}_${NOW}.enc"
    kcat -C -b "${KAFKA_BROKERS}" -t "${topic}" -e | \
    gpg --batch -c --passphrase "${AES_KEY}" - > ${backup_file_path}

    if [ ! -s "$backup_file_path" ]; then
        echo "[!] Error backing up. File is 0kb or does not exist"
        exit 1
    fi
done

echo "[+] List of backups:"
ls -lash /backups

echo "[+] Uploading backup files to S3 bucket: ${S3_BUCKET}"
aws s3 sync /backups/ s3://${S3_BUCKET}/

Be careful with Kafka - only backup topics this way that are relatively small and contain precious data.

What happens when it goes wrong?

There are of course many potential failure modes for any backup system. Fortunately, this has some advantages over some traditional ways of executing a backup.

First, the Kubernetes scheduler will always ensure that the Job is run exactly once on its schedule, no matter how many nodes are running. And if it is interrupted, it will simply start over again reducing the chances of a partial backup run.

Another advantage is that the way failures are reported is consistent with the rest of the kubernetes metrics. Any issue during backup will cause an exit code >1, marking the Job as Failed, and hopefully sending an alert through your monitoring system! Instead of relying on email or slack webhooks, this ties in wonderfully with existing and trusted monitoring tech.

Logging is also a nice bonus - since everything goes through stdout, it will all get picked up by your log monitor and make discovery and troubleshooting a breeze.

All in all, it’s a quick and easy way to get a level of backups working without the fuss of a 3rd party service. All the pieces (probably) already exist, so adding this onto a cluster is basically free. And we love those types of easy wins, don’t we?

推荐订阅源

Posts on Noah Bailey