Change Data Capture with PostgreSQL: Sync Your Search, Cache, and Analytics Without the Double-Write Tax

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-06-14 · via The Practical Developer

Your order service writes a new row to the orders table. Then it calls the search service to index the order so the customer can find it in their history. Then it publishes a message to the analytics pipeline. Then it invalidates the cached order count on the dashboard. Somewhere in that chain, the search call times out. The order is in Postgres but not in search. The customer refreshes the page, sees nothing, and opens a support ticket.

You have a dual-write problem. Every piece of data you care about lives in Postgres, but five other systems need a copy of it, and keeping those copies consistent is a distributed systems problem you did not sign up for. The application code that writes the order is now also responsible for orchestrating a fragile multi-step sync to Elasticsearch, Redis, BigQuery, and whoever else needs the data.

The fix is Change Data Capture (CDC): stream the write-ahead log (WAL) from Postgres and let downstream consumers read the changes directly, in order, without the application layer orchestrating anything.

This post shows what CDC looks like in practice, how to set it up with Postgres logical replication and the pg driver, the consumers you actually need to write, and the trap that makes most CDC pipelines silently lose data on schema changes.

What CDC actually is

Change Data Capture is a pattern that observes every insert, update, and delete on a database table and streams those changes to external systems. Instead of your application code calling multiple services after every write, the database emits a record of the change, and a consumer picks it up asynchronously.

There are two main approaches:

Polling-based CDC. Your consumer runs SELECT * FROM table WHERE updated_at > $last_checked every N seconds. Simple to build. Falls apart with deletes (the row is gone), renames (the user changes their name back and forth), and high-volume tables (the query gets slower as the table grows, and you miss rows between poll intervals unless you use FOR UPDATE SKIP LOCKED).

Log-based CDC. Your consumer connects to Postgres’s replication protocol and reads the write-ahead log directly. Every change is captured at the storage level: inserts, updates, deletes, even schema changes. No polling, no updated_at columns, no missed rows. This is the production-grade approach.

Postgres has supported logical replication since version 9.4 (released 2014). The feature is mature, well-documented, and does not require any extensions for basic use. What it gives you is a stream of changes decoded into a readable format (JSON, protobuf, or a relational representation) that you can consume from any language that speaks the Postgres wire protocol.

Why dual-writes are the default and why they break

Dual-writes are the path of least resistance. A controller creates a record, then fires off side effects:

async function createOrder(req: Request, res: Response) {
  const order = await orderRepo.insert(req.body);

  // Side effect 1: index for search
  await searchClient.index('orders', order.id, order);

  // Side effect 2: invalidate cache
  await cache.del(`order-count:${order.userId}`);

  // Side effect 3: send to analytics
  await analytics.emit('order.created', order);

  res.status(201).json(order);
}

Three side effects, three points of failure. If the search index call times out after 5 seconds, the whole request fails. The order is in Postgres but not in search. If the cache invalidation throws an exception, the next request sees a stale count. If analytics is down, the request handler crashes unless every side effect is wrapped in its own try/catch, and even then, you have to decide whether to swallow the error or retry.

Adding a fourth consumer means editing the controller. Adding a fifth means more surface area for failure. The application layer becomes an orchestration layer for data synchronization, and that is the wrong place for it.

CDC inverts the architecture. The controller writes one thing (the order to Postgres) and returns. The CDC pipeline picks up the change and delivers it to every consumer asynchronously. If Elasticsearch is down, the pipeline retries. If Redis is down, the pipeline retries. If the analytics pipeline is slow, it gets its own stream position and does not block anything else.

async function createOrder(req: Request, res: Response) {
  const order = await orderRepo.insert(req.body);
  res.status(201).json(order);
  // That is it. The CDC pipeline handles the rest.
}

That is the architectural win. Not the specifics of the tooling, but the decoupling.

Setting up Postgres logical replication

Logical replication requires two Postgres configuration changes. Both can be set via ALTER SYSTEM or in postgresql.conf:

wal_level = logical
max_replication_slots = 5  # at least 1 per consumer

Set wal_level to logical and restart Postgres. Then create a publication for the tables you want to replicate:

CREATE PUBLICATION order_events
  FOR TABLE orders, order_items, order_status_history;

A publication defines the set of tables whose changes you want to stream. You can publish all tables (FOR ALL TABLES) but scoping to specific tables makes the pipeline easier to reason about and reduces WAL decode overhead.

Each consumer creates a replication slot that tracks its position in the WAL:

SELECT pg_create_logical_replication_slot(
  'order_consumer',
  'pgoutput'
);

The slot holds the WAL position. If the consumer goes offline for an hour, the slot retains the position. When the consumer reconnects, it resumes from where it left off. The trade-off is that the WAL cannot be recycled past the oldest slot position, so a dead consumer that never reconnects causes WAL bloat and eventually fills the disk. Monitor replication slot lag as a first-class alert.

The Node.js consumer

The pg driver supports the replication protocol natively through Client#replication. Here is a consumer that connects to the logical replication slot and processes changes as they arrive.

import { Client } from 'pg';
import { Transform, Writable } from 'node:stream';

interface ChangeEvent {
  action: 'insert' | 'update' | 'delete';
  table: string;
  schema: string;
  timestamp: bigint;
  old?: Record<string, unknown>;
  new?: Record<string, unknown>;
}

async function startConsumer(slotName: string) {
  const client = new Client({
    connectionString: process.env.DATABASE_URL,
    // replication mode is required for logical replication
    connection: { database: process.env.PGDATABASE },
  });
  await client.connect();

  const stream = await client.replication.startLogical({
    slotName,
    plugin: 'pgoutput',
    startLsn: '0/0', // '0/0' starts from the oldest available WAL
  });

  const parser = new Transform({
    objectMode: true,
    transform(raw: Buffer, _encoding, callback) {
      const messages = parsePgoutput(raw);
      for (const msg of messages) {
        this.push(msg);
      }
      callback();
    },
  });

  const consumer = new Writable({
    objectMode: true,
    highWaterMark: 64,
    async write(event: ChangeEvent, _encoding, callback) {
      try {
        await handleChange(event);
        callback();
      } catch (err) {
        // Log the error and acknowledge the LSN anyway.
        // See the "error handling" section below.
        console.error(`Failed to process event:`, err);
        callback();
      }
    },
  });

  stream.pipe(parser).pipe(consumer);

  return { client, stream, consumer };
}

The pgoutput plugin decodes the WAL into protocol buffer messages. The parser converts those into ChangeEvent objects your application can work with. Each event includes the table name, the operation type, and the before/after column values.

The consumer handlers

Each downstream system gets its own handler function, registered for the appropriate table and operation:

async function handleChange(event: ChangeEvent): Promise<void> {
  switch (event.table) {
    case 'orders':
      if (event.action === 'insert' || event.action === 'update') {
        await Promise.all([
          searchClient.index('orders', event.new!.id, event.new),
          cache.del(`order-count:${event.new!.user_id}`),
        ]);
      } else if (event.action === 'delete') {
        await Promise.all([
          searchClient.delete('orders', event.old!.id),
          cache.del(`order-count:${event.old!.user_id}`),
        ]);
      }
      break;

    case 'order_items':
      if (event.action === 'insert') {
        await analytics.emit('item_purchased', event.new);
      }
      break;

    case 'order_status_history':
      await webhookClient.send('order.status_changed', event.new);
      break;
  }
}

Notice that handleChange does not orchestrate anything. It looks at the event and dispatches to the appropriate downstream systems. If the search index call fails, the error is logged but the pipeline moves on. The change is not lost; it is still in the WAL, and a separate retry mechanism (or a dead-letter queue) can reprocess it later.

The schema change trap

CDC pipelines have one sharp edge that bites every team eventually: schema changes. When you run ALTER TABLE orders ADD COLUMN discount numeric(10,2), the publication still publishes changes for the orders table. But the consumer’s ChangeEvent shape no longer matches. The new column appears in the event, and the handler that destructures event.new might silently ignore it or throw depending on how you access it.

The fix is to version your consumer schema or use a schema registry. The simplest production approach is to store the schema version alongside each event and let the consumer handle migration:

interface ChangeEvent {
  action: 'insert' | 'update' | 'delete';
  table: string;
  schema: string;
  schemaVersion: number;  // bumped on every ALTER TABLE
  timestamp: bigint;
  old?: Record<string, unknown>;
  new?: Record<string, unknown>;
}

When schemaVersion changes, the consumer pauses and runs a migration function that updates its internal column mapping. During the migration window, events are buffered in memory or written to a staging table. Once the migration completes, the consumer resumes from the buffered position.

For teams that cannot tolerate even a short pause, the more robust approach is to use a schema registry like Apicurio or Confluent Schema Registry alongside Protobuf-encoded WAL output. The schema registry maps each schema version to a numeric ID included in the message. Consumers fetch the latest schema from the registry when the ID changes and adapt on the fly. This is the Debezium approach, and it works well at scale, though it adds operational complexity.

Idempotency and ordering

CDC pipelines deliver changes in the order they were committed to the WAL, but downstream systems might not preserve that order naturally. If you update order 42’s status from “pending” to “shipped” and then to “delivered” within a second, both updates hit the search index in the right order on the wire. But if the search index client batches writes internally, the “delivered” update could land before the “shipped” update.

The fix is to include an LSN or commit timestamp in every event and let the consumer use it for ordering:

async function updateSearchIndex(event: ChangeEvent): Promise<void> {
  const existing = await searchClient.get('orders', event.new!.id);
  if (existing && existing._lsn >= event.lsn) {
    // Already processed a newer or identical change
    return;
  }
  await searchClient.index('orders', event.new!.id, {
    ...event.new,
    _lsn: event.lsn,
  });
}

The LSN is a monotonic position in the WAL. No two commits share the same LSN. By storing the last-processed LSN in the target document, the consumer becomes idempotent: it can safely retry or receive the same event twice.

When to add Debezium and Kafka

The Node.js-native approach above works well for teams with fewer than ten tables and moderate throughput (under 1,000 changes per second). Beyond that, three problems emerge:

Connection management. Each consumer opens its own replication connection. Ten consumers means ten replication slots and ten WAL decoder processes on the Postgres server. Postgres handles this fine until it does not, and the failure mode is WAL bloat that cascades to an outage.
Full snapshots. When you add a new consumer that needs to start from the beginning, you have to snapshot the entire table, then catch up on WAL changes that arrived during the snapshot. The two-phase snapshot (lock table, copy data, release lock, replay WAL) is tricky to get right, and a naive implementation holds locks longer than production can tolerate.
Schema evolution at scale. Mapping column renames and type changes across multiple consumers without a schema registry is tedious and error-prone.

Debezium solves all three. It runs as a set of Kafka Connect connectors, one per database. Each connector manages a single replication slot and publishes changes to a Kafka topic per table. Consumers read from Kafka topics instead of connecting to Postgres directly. The architecture looks like this:

Postgres WAL -> Debezium connector -> Kafka topic `dbserver1.public.orders` -> Consumer

Debezium handles full snapshots automatically (with snapshot.mode controlling whether it takes an initial snapshot, resumes from the last offset, or recovers from a schema-only state). It integrates with the Confluent Schema Registry for Protobuf or Avro schemas. It tracks schema changes as separate messages so consumers can adapt.

The cost is operational: you need Kafka and Kafka Connect in your infrastructure. For teams already running Kafka as a message bus, adding Debezium is a small operational increment. For teams that are not, the Node.js-native approach above is the right starting point.

The checklist that prevents silent data loss

Before you put a CDC pipeline in production, verify each of these:

wal_level = logical is set on the primary. If you use a replica for CDC, ensure hot_standby_feedback = on so the replica’s WAL is not recycled prematurely.
Replication slots are monitored. Alert when slot lag exceeds 10 minutes. A disconnected consumer inflates the WAL and fills the disk.
Every consumer stores its last-processed LSN in a durable store and resumes from it on restart.
Schema changes are deployed with a consumer migration plan. Either version the consumer schema or use ALTER TABLE ... ADD COLUMN IF NOT EXISTS to make column additions backward-compatible with old WAL events.
The consumer writes failed events to a dead-letter table before acknowledging them. Logging and moving on silently loses data.
You have tested the pipeline with a schema change. Run ALTER TABLE ... ADD COLUMN ... while the pipeline is running and verify the consumer adapts.
The consumer is idempotent. Replaying the same WAL position produces the same state in downstream systems.

Takeaway

CDC eliminates the dual-write pattern that makes application code brittle and downstream systems inconsistent. Instead of orchestrating five side effects per write, your controller writes once and the WAL stream handles distribution. The Postgres-native approach using pgoutput is production-ready for moderate throughput and straightforward to implement in Node.js. Teams that outgrow it graduate to Debezium and Kafka for schema registry integration, snapshot management, and operational isolation.

Dual-writes are the default because they are the obvious thing to type. CDC is the alternative because it is the correct architectural boundary between your database and everyone who needs its data.

A note from Yojji

Building a data pipeline that reliably streams production changes to search, cache, and analytics systems without losing events or corrupting state is the kind of infrastructure work that separates a prototype from a platform. It requires deep familiarity with Postgres replication internals, careful consumer design for idempotency, and operational monitoring to catch slot lag before it fills a disk. Yojji’s teams build this kind of data infrastructure regularly, from CDC pipelines and event-driven architectures to full-cycle product engineering across the JavaScript ecosystem and cloud platforms on AWS, Azure, and Google Cloud.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their senior engineering teams specialize in the JavaScript stack (React, Node.js, TypeScript), cloud-native deployments, and the kind of production-hardened backend architecture that makes data flow reliably between systems without the heroics.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

What CDC actually is

Why dual-writes are the default and why they break

Setting up Postgres logical replication

The Node.js consumer

The consumer handlers

The schema change trap

Idempotency and ordering

When to add Debezium and Kafka

The checklist that prevents silent data loss

Takeaway

A note from Yojji