惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
L
LINUX DO - 热门话题
D
DataBreaches.Net
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
P
Proofpoint News Feed
The Register - Security
The Register - Security
N
Netflix TechBlog - Medium
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
博客园 - 司徒正美
J
Java Code Geeks
Engineering at Meta
Engineering at Meta
Y
Y Combinator Blog
月光博客
月光博客
Hugging Face - Blog
Hugging Face - Blog
Google DeepMind News
Google DeepMind News
Vercel News
Vercel News
M
MIT News - Artificial intelligence
The Cloudflare Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Threatpost
I
Intezer
Recent Announcements
Recent Announcements
博客园 - 【当耐特】
Google DeepMind News
Google DeepMind News
H
Hackread – Cybersecurity News, Data Breaches, AI and More
N
News and Events Feed by Topic
L
Lohrmann on Cybersecurity
小众软件
小众软件
雷峰网
雷峰网
L
LINUX DO - 最新话题
Application and Cybersecurity Blog
Application and Cybersecurity Blog
aimingoo的专栏
aimingoo的专栏
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
博客园 - 叶小钗
P
Privacy & Cybersecurity Law Blog
博客园 - Franky
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
T
The Exploit Database - CXSecurity.com
G
Google Developers Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
P
Privacy International News Feed
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Last Week in AI
Last Week in AI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Postgres on Kubernetes in 2026: production setup
Muskan · 2026-06-17 · via DEV Community

Quick take

Bitnami's free Postgres Helm chart is effectively dead in 2026. Broadcom moved tagged images behind a paywall in September 2025. The production-grade replacement is CloudNativePG, a CNCF Postgres operator with streaming replication, point-in-time recovery, and automatic failover. Here is the setup that actually runs in prod, every block explained.

If you only have 90 seconds, this is the shape:

  • Bitnami is over for new free deployments. Migrate or pay.
  • CloudNativePG (CNPG) is the new default and is genuinely operator-grade.
  • The four things you must get right: storage class, backup target, replica count, and pooler.

Why 2026 changed the Postgres-on-K8s playbook

Three things shifted in the last twelve months and almost every existing tutorial is now wrong.

Bitnami went paywalled. Broadcom took ownership of Bitnami after the VMware acquisition and moved all tagged Postgres images behind a Tanzu subscription in September 2025. The Helm chart still exists, but the images it points to are no longer free. Teams running bitnami/postgresql:15.4.0-debian-12-r0 on a fresh cluster get a registry auth error.

CloudNativePG hit GA at the CNCF. CNPG was incubated through 2024 and is now the de facto Postgres operator on Kubernetes. It handles streaming replication, automatic failover, backup, and point-in-time recovery without you stitching scripts.

FOCUS and FinOps changed the cost story. Postgres on K8s used to lose on price to managed RDS for small workloads. With Karpenter, gp3 storage at $0.08/GB-month, and CNPG's bin-packing, the break-even shifted. For most teams above ~$300/month of RDS spend, CNPG is cheaper.

The upshot is that the question is no longer "Bitnami or Crunchy" but "operator or managed service."

Operator versus chart: pick one before you start

The wrong choice here costs a week of rework. Here is the rule I use.

Use case Pick
Dev or CI throwaway database Plain Helm chart, no operator
Single-AZ staging, no HA needed Plain Helm chart with PVC
Production with HA, backup, PITR CloudNativePG operator
Multi-tenant SaaS, many DBs CloudNativePG with per-tenant Cluster CRs
Regulated workload, point-in-time recovery audit Managed (RDS, Cloud SQL, Aiven)

If your environment crosses the line from "throwaway" to "actually serves users," the operator path is the only one that ages well. The rest of this post assumes CNPG.

A production CNPG setup at a glance

Here is the values map that actually deploys cleanly on a fresh cluster today. Every row is load-bearing.

Block Setting Production value
Cluster instances 3 (one primary, two standby)
Image imageName ghcr.io/cloudnative-pg/postgresql:16.4
Storage storage.size, storage.storageClass 100Gi, gp3 on AWS or pd-ssd on GCP
WAL storage walStorage.size Separate PVC, 20Gi minimum
Resources requests / limits 1 CPU / 4Gi request, 4 CPU / 16Gi limit
Backup backup.barmanObjectStore S3 or GCS bucket, with 7-day retention
Monitoring monitoring.enablePodMonitor true
Pooler Pooler CR PgBouncer, transaction mode, 25 default pool size
Security runAsNonRoot, fsGroup true, 26 (Postgres UID)

That is the production minimum. Nine settings, each load-bearing.

Replication and high availability

CNPG runs streaming replication out of the box. Three instances is the sweet spot: one primary, two synchronous standbys. With three you survive the loss of any single node without committing reads or writes against unverified data.

Synchronous versus async replication

  • Synchronous (synchronous.method: any) waits for at least one standby to ack the write. Slight latency cost, zero data loss on failover.
  • Asynchronous is the default and is fine for most teams. Failover can lose the last 1 to 5 seconds of writes.

The trade-off is real. I use synchronous for payment, ledger, and audit databases. Async for everything else.

Failover behavior

CNPG promotes a standby in 5 to 30 seconds when the primary fails. The application sees a brief connection error and needs to reconnect. If your client library does not retry on connection loss, you will see user-visible errors during failover. The Pooler resource fronts this with PgBouncer, which is the next section.

Backups and point-in-time recovery

The number-one mistake I see is teams running CNPG without a configured backup target. The operator does not back up by default. You explicitly create a Backup and ScheduledBackup resource.

What to configure

  • Object storage target (S3, GCS, Azure Blob) with a barmanObjectStore block.
  • Backup schedule running daily at low-traffic hours, retention of 7 to 30 days.
  • WAL archiving enabled, which is required for point-in-time recovery to work.
  • Retention policy that survives accidental cluster deletion. Use object storage lifecycle rules, not just CNPG retention.

Test the restore. An untested backup is a wish, not a backup. Quarterly restore drill to a sandbox cluster is the bar.

Observability and connection pooling

CNPG ships a Postgres exporter built into the operator. Enable monitoring.enablePodMonitor: true and you get pgsql_up, replication lag, transaction counts, and slow query metrics scraped by Prometheus automatically.

The connection pooler is more subtle. Use PgBouncer in transaction mode. CNPG provisions it as a separate Pooler CR pointing at your Cluster. Set default_pool_size: 25 for most workloads, higher only if you have measured contention. More is not better here.

Without pooling, every Lambda or pod opens a fresh Postgres connection. Connection storms during traffic spikes are the second most common Postgres-on-K8s incident I get pulled into.

The four pitfalls that wreck a fresh CNPG install

Every incident I have helped debug has been one of these.

1. The "we will configure backups later" trap

CNPG looks healthy without a backup config. The dashboards stay green. Then someone runs a bad DELETE, and the team learns that "later" never happened. Configure the backup target on day one.

2. The missing WAL storage

Putting WAL on the main data PVC means a write burst can fill the disk and stop the primary. Always create a separate walStorage PVC of at least 20Gi. Operator default is to skip this, which is the operator's only real footgun.

3. The under-sized pooler

Defaulting PgBouncer to 10 connections looks safe but causes lock contention under load. 25 default plus 10 reserve is the floor I use for any production cluster.

4. The mismatched Postgres version on restore

Restoring a backup from Postgres 15 to a cluster running 16 sometimes works and sometimes silently corrupts the database. Pin major versions and only upgrade via CNPG's documented major-version path.

Where this setup still falls short

This is the honest part.

Cross-region disaster recovery is not built in. CNPG can run a "distributed" cluster across regions but the network and storage cost makes it expensive. Most teams ship WAL archives to a second region and restore on demand.

Logical replication for zero-downtime upgrades still needs manual orchestration. CNPG 1.24 added some helpers, but anything more complex than a minor version bump is a project.

Connection multiplexing across nodes is single-pooler. For 10,000+ concurrent clients you need a layer in front of the pooler too. RDS Proxy has no CNPG equivalent that is as mature.

Frequently asked questions

Can I migrate from Bitnami to CNPG without downtime?
Mostly yes. Use logical replication from the Bitnami primary to a new CNPG cluster, then cut over with a brief read-only window. The CNPG docs have a migration guide.

Does CNPG support PgVector?
Yes. Use a CNPG-compatible image that includes the extension, like ghcr.io/cloudnative-pg/postgresql:16.4-pgvector. The operator does not care about extensions, only the underlying Postgres binary.

How much memory should a production primary get?
Rule of thumb: shared_buffers at 25% of memory, work_mem at 4 to 16MB per connection. A 4Gi pod fits a small workload. Anything serving real traffic should be 8Gi minimum.

Is CNPG ready for OLTP at scale?
Yes. There are public references of CNPG running multi-TB clusters at 50k QPS. The bottleneck in 2026 is almost always your storage class throughput, not the operator.

Should I run Patroni instead?
Patroni works and is mature, but you operate it yourself. CNPG hides the same primitives behind a friendlier abstraction. Unless you have an existing Patroni investment, CNPG is the lower-effort choice.

What is your current Postgres setup?

If you are still on Bitnami in 2026, the question is whether you are paying for Tanzu or paying in incidents. If you have moved to CNPG, what tripped you up first? Drop your stack in the comments. I read every one and reply with what I would have set differently.