惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Schneier on Security
Hugging Face - Blog
Hugging Face - Blog
V
Visual Studio Blog
博客园 - Franky
酷 壳 – CoolShell
酷 壳 – CoolShell
Last Week in AI
Last Week in AI
博客园 - 叶小钗
博客园_首页
阮一峰的网络日志
阮一峰的网络日志
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Application and Cybersecurity Blog
Application and Cybersecurity Blog
TaoSecurity Blog
TaoSecurity Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
爱范儿
爱范儿
宝玉的分享
宝玉的分享
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
量子位
N
News and Events Feed by Topic
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Recent Commits to openclaw:main
Recent Commits to openclaw:main
SecWiki News
SecWiki News
MyScale Blog
MyScale Blog
AI
AI
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
博客园 - 【当耐特】
Security Archives - TechRepublic
Security Archives - TechRepublic
F
Fortinet All Blogs
V2EX - 技术
V2EX - 技术
T
Troy Hunt's Blog
有赞技术团队
有赞技术团队
W
WeLiveSecurity
Project Zero
Project Zero
T
Tor Project blog
Help Net Security
Help Net Security
L
LINUX DO - 最新话题
IT之家
IT之家
The Hacker News
The Hacker News
腾讯CDC
Schneier on Security
Schneier on Security
N
News and Events Feed by Topic
C
Cisco Blogs
博客园 - 聂微东
Webroot Blog
Webroot Blog
Forbes - Security
Forbes - Security
M
MIT News - Artificial intelligence
C
Cyber Attacks, Cyber Crime and Cyber Security
雷峰网
雷峰网
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
A
About on SuperTechFans

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Redis Isn't PostgreSQL: Building a Hybrid Change Data Capture Runtime in Ruby
Ken C. Demanawa · 2026-06-27 · via DEV Community

I Built Commercial Redis CDC Source Drivers for Ruby — Here's What I Learned

For the past couple of years I've been building a Change Data Capture (CDC) ecosystem for Ruby.

Like many CDC projects, it started with PostgreSQL. PostgreSQL's Write-Ahead Log (WAL) is an excellent source of truth: durable, ordered, replayable, and well understood. It provides exactly the properties you want when you're building reliable event pipelines.

But the deeper I went into distributed systems, the more I realized something important.

Many systems don't observe change from PostgreSQL first.

They observe it from Redis.

Redis often sits at the front of modern architectures:

  • Redis Streams carry application events.
  • Pub/Sub distributes transient state changes.
  • Keyspace notifications react to cache invalidation and key expiry.
  • Redis Cluster routes events across multiple primaries.

In many systems, Redis sees a change before PostgreSQL ever commits it.

That raised an interesting question:

Can Redis become a first-class Change Data Capture source?

The obvious answer is "yes."

The interesting answer is "yes—but not in the same way PostgreSQL does."

That distinction eventually became cdc-redis-pro, a commercial Redis source driver for the Ruby CDC ecosystem.

This article isn't a product announcement.

It's an engineering write-up about the architectural decisions behind the project, the tradeoffs Redis forces you to make, and the execution model that ultimately emerged.


Redis Doesn't Have One CDC Interface

One misconception I frequently encounter is the assumption that Redis has an equivalent of PostgreSQL's WAL.

It doesn't.

Instead, Redis exposes several completely different mechanisms for observing change.

Source Delivery Replay
Streams At-least-once Yes
Pub/Sub At-most-once No
Sharded Pub/Sub At-most-once No
Keyspace Notifications At-most-once No

At first glance they all look like "events."

Operationally they're completely different systems.

Streams are durable.

Pub/Sub isn't.

Keyspace notifications exist primarily as operational signals.

Sharded Pub/Sub introduces routing constraints that don't exist elsewhere.

Treating them all as the same abstraction inevitably hides important guarantees—and hidden guarantees eventually become production incidents.

Instead of pretending every Redis source behaves identically, I wanted the API to expose those differences explicitly.

If a source cannot replay missed messages, the API should say so.

If a reconnect creates a loss window, operators should know exactly when it happened.

Infrastructure software shouldn't hide reality.

It should make reality easier to reason about.


Redis and PostgreSQL Solve Different Problems

A common question is:

"If Redis can generate change events, why not replace PostgreSQL CDC entirely?"

Because they solve different problems.

PostgreSQL's WAL is the durable history of your system.

Redis is often the earliest signal that something is happening.

One tells you what committed.

The other tells you what is happening right now.

They're complementary.

Not competing.

Conceptually, I think about them like this:

                    PostgreSQL WAL
                          │
                          ▼
                 Durable Record of Truth

Redis Streams / PubSub / Keyspace
              │
              ▼
        Fast Operational Signal

The goal isn't choosing one over the other.

The goal is allowing both to participate in the same downstream processing pipeline.

That required another architectural boundary.


A Common Language for Change Events

One of the design goals of the broader CDC ecosystem is that downstream processors shouldn't care where an event originated.

Whether a change comes from PostgreSQL logical replication or Redis Streams, the downstream processing model should remain identical.

That boundary is CDC::Core::ChangeEvent.

Instead of exposing PostgreSQL-specific or Redis-specific payloads to processors, each source is normalized into a common event model.

Conceptually the pipeline looks like this:

                PostgreSQL WAL
                     │                     
                pgoutput-client
                     │
                     ▼
                 ChangeEvent
                       ▲
                       │
                 cdc-redis-pro
                       │
        Streams / PubSub / Keyspace

Everything downstream consumes the same normalized event.

A webhook processor doesn't need to know whether the event came from WAL or Redis.

A search indexing pipeline doesn't care.

An audit sink doesn't care.

Even the execution runtime doesn't care.

That separation between source acquisition and event processing became one of the defining architectural decisions of the ecosystem.

As the project grew, it became clear that acquiring events efficiently and processing them efficiently are two different problems—and they scale independently.

That realization eventually led to a separate execution engine: cdc-orchestrator-pro.

We'll come back to that shortly.

First, let's look at what makes each Redis source fundamentally different.

Redis Isn't One Event System. It's Four.

The first surprise when building a Redis CDC source is that there isn't a single Redis change stream.

There are four.

Each has different delivery guarantees.

Each behaves differently during failures.

Each recovers differently after reconnects.

And each answers a different operational question.

Treating them as interchangeable would have made the implementation simpler—but it also would have hidden the exact information operators need during production incidents.

Instead, cdc-redis-pro embraces those differences.


Redis Streams: The Durable Path

Redis Streams is the closest thing Redis has to a traditional CDC source.

Messages are persisted.

Consumers maintain checkpoints.

Consumer groups coordinate work.

Failed consumers leave pending entries behind for recovery.

In many ways, Streams feels familiar to anyone coming from Kafka or PostgreSQL logical replication.

That made it the natural foundation for the recoverable side of the driver.

The Streams implementation supports:

  • XREAD
  • XREADGROUP
  • Consumer Groups
  • Pending-entry inspection
  • XAUTOCLAIM
  • Duplicate suppression
  • Optional dead-letter streams

Operationally, Streams is the only Redis source that provides genuine replay.

If a downstream worker crashes halfway through a batch, processing resumes from the last committed checkpoint rather than silently dropping work.

Conceptually, it looks like this:


             Producer
                │
                ▼
          Redis Stream
                │
          Consumer Group
                │
                ▼
          cdc-redis-pro
                │
           ChangeEvent
                │
                ▼
         Downstream Runtime

This is the strongest consistency story Redis offers.

It isn't PostgreSQL's WAL—but it isn't trying to be.

It's a durable event log designed for application-level workflows.


Pub/Sub: Fast, But Ephemeral

Pub/Sub solves a completely different problem.

Messages exist only while subscribers are connected.

Disconnect for five seconds.

Those five seconds are gone forever.

That isn't a bug.

It's the contract.

Many libraries attempt to hide this by automatically reconnecting.

The problem is that reconnecting doesn't recover missed messages.

It only resumes receiving future ones.

Pretending otherwise creates false confidence.

Instead, cdc-redis-pro treats Pub/Sub as an explicitly at-most-once source.

Reconnects are measured.

Loss windows are reported.

Operators can immediately see:

  • when the disconnect occurred,
  • how long the subscriber was offline,
  • and exactly where message loss became possible.

That distinction matters.

Infrastructure software shouldn't promise guarantees the underlying system doesn't provide.


Sharded Pub/Sub Changes the Topology

Redis Cluster introduces another variation.

Sharded Pub/Sub distributes channels across multiple primaries.

That improves scalability, but it also means subscriptions become topology-aware.

A reconnect isn't always reconnecting to the same node.

During resharding, ownership of a channel may move entirely.

Handling that correctly requires continuously tracking cluster topology rather than assuming a fixed server layout.

The driver automatically discovers topology through CLUSTER SHARDS and transparently rebinds subscriptions as ownership changes.

To downstream processors, events continue arriving normally.

To operators, topology changes remain observable.


Keyspace Notifications Aren't Really CDC

Keyspace notifications are probably the easiest Redis feature to misunderstand.

They're incredibly useful.

They're also incredibly easy to misuse.

Keyspace notifications exist to announce that Redis itself performed an operation:

  • a key expired,
  • a value changed,
  • a key was deleted,
  • a hash was updated.

They're operational signals.

They're not durable history.

They're not replayable.

And by the time you receive an expiration notification, the value may already be gone.

That's simply how Redis works.

Rather than pretending every notification contains complete information, the driver offers optional best-effort value enrichment whenever the value still exists.

If it doesn't, the event still proceeds.

The guarantee remains explicit.


Delivery Guarantees Should Stay Visible

One design principle shaped almost every API in the project.

I didn't want to normalize away delivery semantics.

Instead, I wanted them to remain visible all the way to the operator.

Think of it like a database transaction.

You wouldn't want a library to silently convert an eventually-consistent operation into something that merely looks transactional.

The same idea applies here.

Different Redis sources have different operational characteristics.

The API should preserve them.

That philosophy can be summarized like this:

Source Replay Delivery Typical Use
Streams At-least-once Durable workflows
Pub/Sub At-most-once Live events
Sharded Pub/Sub At-most-once Cluster-scale broadcasts
Keyspace Notifications At-most-once Operational signals

None of these are "better."

They're simply optimized for different workloads.


Topology Matters More Than Features

Supporting Redis isn't just about supporting commands.

It's about supporting deployments.

A surprising amount of complexity came not from Streams or Pub/Sub themselves, but from the environments they run in.

The driver currently supports:

  • Standalone Redis
  • Redis Sentinel
  • Redis Cluster
  • TLS
  • ACL authentication

Cluster support turned out to be particularly interesting.

Streams must remain within a single hash slot.

Cross-slot reads fail.

Pub/Sub subscriptions migrate during resharding.

Connections disappear during primary failover.

Those aren't edge cases.

They're normal operating conditions in production.

Every supported topology is continuously exercised using Docker-based integration tests covering failover, node restarts, resharding, authentication, and TLS.

I wanted the implementation to reflect how Redis is actually deployed—not just how it behaves on a laptop.


Acquiring Events Is Only Half the Problem

By this point, the source layer was capable of reliably acquiring events from every major Redis deployment model.

The next question became much harder.

How do you process them efficiently?

One worker?

Ten workers?

Hundreds?

How do you preserve ordering where it's required while still exploiting modern Ruby's parallelism?

It turned out that reading events from Redis wasn't the difficult part.

Scheduling what happened after they were read became the real engineering challenge.

That challenge eventually became HybridRuntime, the execution engine inside cdc-orchestrator-pro.

And surprisingly, the solution wasn't built around threads.

It was built around ownership.

The Architecture I'm Most Proud Of

Surprisingly, reading events from Redis wasn't the hardest part of the project.

Scheduling what happened after those events arrived was.

Modern Ruby gives us two powerful concurrency primitives:

  • Ractors for parallel CPU execution
  • Fibers for concurrent I/O

Most systems choose one.

I wanted both.

That eventually became HybridRuntime, the execution engine inside cdc-orchestrator-pro.

Its job isn't tied to Redis.

Redis simply happened to be the workload that exposed the problem first.


Event Acquisition and Event Processing Are Different Problems

One architectural realization changed the direction of the project.

Reading events from a source and processing those events are two completely different concerns.

They're limited by different bottlenecks.

They scale independently.

A PostgreSQL logical replication connection is fundamentally serial.

A Redis Stream consumer is similarly constrained.

But once an event has been acquired and normalized into a CDC::Core::ChangeEvent, downstream processing becomes embarrassingly parallel.

That naturally separates the pipeline into two halves.

                    Source Layer
                         │
         PostgreSQL WAL / Redis Streams
                         │
                         ▼
                CDC::Core::ChangeEvent
                         │
                         ▼
                  Execution Layer

Once an event reaches the execution layer, its origin no longer matters.

Redis.

PostgreSQL.

A future Kafka adapter.

A future S3 replay.

The runtime simply processes ChangeEvent.

That separation turned out to be one of the most valuable architectural decisions in the ecosystem.


HybridRuntime

HybridRuntime combines two existing execution engines from the CDC ecosystem.

  • cdc-parallel provides pools of prewarmed Ractors for true CPU parallelism.
  • cdc-concurrent provides asynchronous Fiber pools for overlapping I/O within each Ractor.

Together they form a nested execution model.

                 HybridRuntime
                        │
        ┌───────────────┴───────────────┐
        ▼                               ▼
  Ractor Pool                    Ractor Pool
        │                               │
        ▼                               ▼
   Fiber Pool                     Fiber Pool
        │                               │
        ▼                               ▼
Redis Connections              Redis Connections

The interesting observation is that parallelism and concurrency solve different problems.

Ractors increase throughput by executing work simultaneously.

Fibers increase throughput by avoiding idle time while waiting for I/O.

The runtime deliberately uses both.


The Inception Pool

As the architecture evolved, I noticed something amusing.

Every layer owned another pool.

The runtime owns a pool of Ractors.

Each Ractor owns a LocalResourcePool.

Each LocalResourcePool owns a pool of Fibers.

Each Fiber owns a live Redis connection.

It looked like this:

HybridRuntime
     │
     ▼
Prewarmed Ractor Pool
     │
     ▼
LocalResourcePool
     │
     ▼
Fiber Pool
     │
     ▼
Redis Connections

Internally I started calling it the Inception Pool.

A pool containing pools containing pools.

The name stuck.


Ownership Instead of Synchronization

Most concurrent systems solve shared state by protecting it.

Threads
  │
  ▼
Mutex
  │
  ▼
Shared Connection Pool

The more workers you add, the more frequently they compete for the same resources.

Locks become unavoidable.

HybridRuntime takes a different approach.

Instead of synchronizing ownership...

...it avoids sharing ownership entirely.

Every Redis client is created inside the Ractor that will use it.

It never leaves that Ractor.

Nothing is borrowed.

Nothing is shared.

Nothing requires a mutex.

Conceptually it looks like this.

Ractor 1
   │
   ├── Redis Connection A
   ├── Redis Connection B
   └── Fiber Scheduler

Ractor 2
   │
   ├── Redis Connection A
   ├── Redis Connection B
   └── Fiber Scheduler

The only thing that crosses a Ractor boundary is an immutable ChangeEvent.

Everything else remains local.

This aligns naturally with Ruby's ownership model.

Mutable state belongs somewhere.

Rather than fighting that constraint, the runtime embraces it.


Why LocalResourcePool Exists

That ownership model eventually led to another component:
CDC::Orchestrator::Pro::LocalResourcePool.

Unlike traditional connection pools, a LocalResourcePool isn't shared across Ractors.

The pool itself is shared as an immutable coordinator.

The live resources are not.

Instead, every Ractor lazily creates and owns its own resource pool the first time it needs one.

             LocalResourcePool
                    │
      ┌─────────────┴─────────────┐
      ▼                           ▼
  Ractor A                   Ractor B
      │                           │
 Resource Pool               Resource Pool
      │                           │
 Redis Connections          Redis Connections

Each Ractor owns its resources for their entire lifetime.

Nothing crosses a Ractor boundary.

Nothing requires synchronization.

The work moves.

The connections don't.

This turns out to be a natural fit for long-lived resources such as:

  • Redis clients
  • PostgreSQL connections
  • HTTP clients
  • Elasticsearch clients
  • S3 clients

Every Ractor operates independently using resources it owns locally.

Rather than coordinating access to a shared pool, the runtime coordinates immutable ChangeEvents while leaving the underlying resources exactly where they were created.

The result is a simpler ownership model, reduced contention, and an execution architecture that scales naturally with additional Ractors.


Two Independent Scaling Axes

Another consequence of this architecture is that acquisition and processing no longer have to scale together.

Suppose a Redis deployment only needs three acquisition workers.

That says nothing about how many processing workers you need.

You might run:

Acquisition

3 Ractors
5 Fibers each

↓

Processing

7 Ractors
20 Fibers each

Each side can be tuned independently.

Adding more downstream workers doesn't require opening additional Redis Streams.

Adding more source readers doesn't require changing the execution topology.

The two halves of the pipeline evolve independently.

That separation proved invaluable during benchmarking because it exposed where the real bottlenecks actually lived.


Beyond Redis

One realization surprised me.

HybridRuntime wasn't solving a Redis problem.

It was solving an event-processing problem.

Redis happened to be the first source.

The same execution model works for:

  • PostgreSQL logical replication
  • Redis Streams
  • Webhook delivery
  • Search indexing
  • Object storage sinks
  • Future Kafka adapters
  • Future message brokers

Anything capable of producing a CDC::Core::ChangeEvent automatically inherits the same execution engine.

That ultimately justified extracting the runtime into its own commercial component: cdc-orchestrator-pro.

Originally it lived inside another project.

Eventually it became obvious that it wasn't a Redis runtime.

It wasn't a Sidekiq runtime.

It wasn't even a PostgreSQL runtime.

It was an execution fabric for normalized change events.

Redis simply happened to be the benchmark that inspired it.


Parallelism Isn't Free

One thing the benchmarks made very clear is that parallelism isn't magic.

Adding more Ractors doesn't produce linear speedups.

It introduces coordination costs.

Partition routing.

Mailbox communication.

Ordering constraints.

Preserving correctness means accepting those costs.

Understanding where those tradeoffs appear became just as interesting as the throughput numbers themselves.

Let's look at what those benchmarks actually measured.


Where This Actually Fits

After spending so much time discussing architecture, it's worth asking a simple question.

Who actually needs this?

The honest answer is:

Not every Rails application.

If Redis is simply a cache sitting beside your database, this project is probably unnecessary.

Likewise, if every important state transition already commits to PostgreSQL before anything else happens, PostgreSQL logical replication alone may be all the CDC infrastructure you need.

cdc-redis-pro exists for a much narrower class of systems.

Systems where Redis is part of the application's event architecture rather than merely its cache.


Redis Streams as an Event Bus

This is probably the most natural fit.

Many distributed systems already use Redis Streams as their internal event bus.

Order Service
    │
    ▼
Redis Stream
    │
    ▼
Consumers

Once Redis becomes the place where work is coordinated, durability suddenly matters.

Consumers crash.

Deployments restart.

Networks partition.

A consumer needs to know where to resume.

Redis Streams already provides those building blocks.

Consumer Groups.

Pending Entries.

Checkpoint IDs.

XAUTOCLAIM.

The job of cdc-redis-pro isn't replacing those mechanisms.

It's integrating them into a larger event-processing pipeline while preserving their semantics.


Fast Signals Before Durable State

Many systems generate transient events before anything reaches PostgreSQL.

Examples include:

  • inventory availability
  • market data
  • IoT telemetry
  • collaborative editing
  • multiplayer game state

These events often exist for milliseconds.

Some are never intended to become permanent records.

Waiting for a database commit before reacting introduces unnecessary latency.

Redis already has the signal.

The application simply needs a reliable way to observe it.

That's exactly where Redis becomes a valuable CDC source.

Not because it replaces the database.

Because it observes change sooner.


Redis and PostgreSQL Together

The architecture becomes much more interesting when both sources exist simultaneously.

Imagine an order-processing pipeline.

Customer clicks Buy
       │
       ▼
Redis Stream
       │
 Immediate downstream processing
        │
 PostgreSQL Transaction
        │
        ▼
 Logical Replication

Redis carries the operational signal.

PostgreSQL records the durable history.

Eventually both become the same normalized object.

Redis Streams
        │
        ▼
   ChangeEvent
        ▲
        │
PostgreSQL WAL

Once normalized, downstream processing becomes identical.

That separation allows each technology to do what it does best.

Redis optimizes for responsiveness.

PostgreSQL optimizes for durability.

Neither replaces the other.


Event Processing Shouldn't Care About the Source

One of the design goals of the CDC ecosystem is that processors shouldn't know—or care—where an event originated.

A webhook dispatcher shouldn't behave differently because the event came from Redis instead of PostgreSQL.

Neither should:

  • search indexing
  • audit sinks
  • analytics
  • cache invalidation
  • AI pipelines
  • object storage
  • future message brokers

Every processor consumes exactly the same event model.

Redis
   │
   ▼
 ChangeEvent
       ▲
       │
 PostgreSQL
      │
      ▼
 Processor
        │
 ┌──────┼────────┬────────┬────────┐
 ▼      ▼        ▼        ▼        ▼

Webhook Search  Audit   Redis   Future...

That separation is what allows the runtime to remain completely source-agnostic.


Ordered Workloads

Not every workload benefits equally from parallelism.

Suppose an application updates customer balances.

+100
-20
+15

Processing those out of order would produce incorrect state.

Ordering matters.

Other workloads don't have that constraint.

Search indexing.

Webhook fan-out.

Telemetry aggregation.

Independent cache updates.

Those can often execute concurrently.

One of the runtime's responsibilities is recognizing that not every processor requires the same ordering guarantees.

Correctness always comes first.

Throughput comes second.


Why Not Just Use Sidekiq?

This is probably the question Ruby developers ask most often.

After all, Sidekiq already provides a robust distributed job system.

The answer is that jobs and change streams solve different scheduling problems.

A job queue answers:

"What work should execute next?"

A CDC runtime answers:

"How should related events flow through the system while preserving their correctness?"

Those are similar questions.

They're not the same question.

Jobs are independent.

Change events frequently aren't.

Ordering.

Checkpoints.

Replay.

Transaction boundaries.

Partition routing.

Those become first-class concerns in CDC systems.

Rather than replacing Sidekiq, the runtime sits at a different layer.

Sidekiq remains an excellent execution engine for background jobs.

HybridRuntime focuses on ordered event pipelines.

The two complement one another rather than compete.


Lessons Learned

Building cdc-redis-pro changed how I think about event-driven systems.

A few observations kept appearing throughout development.

Redis isn't PostgreSQL.

Trying to force Redis into a WAL-shaped abstraction usually hides important operational behavior.

Delivery guarantees matter more than APIs.

Two systems exposing similar methods may have completely different recovery characteristics.

Ownership scales better than synchronization.

Keeping mutable resources inside a single Ractor proved simpler than sharing them across many workers.

Acquisition and processing are independent problems.

The bottleneck for reading events is rarely the same bottleneck for processing them.

Treating those concerns separately made both architectures significantly cleaner.

Most importantly...

Infrastructure shouldn't hide tradeoffs.

It should make them explicit.

That's the philosophy behind the entire project.

The benchmark results ended up reflecting exactly those design decisions.

What the Benchmarks Actually Mean

Benchmark numbers are easy to misunderstand.

They're also surprisingly easy to exaggerate.

I wanted to avoid both.

Rather than publishing a single headline number, I built a benchmark matrix that explored how the runtime behaves under different execution strategies.

The goal wasn't to find the biggest number.

The goal was to understand where the architecture stops scaling—and why.


Measuring Different Parts of the Pipeline

Not every benchmark measures the same thing.

Some benchmarks measure source acquisition.

Others measure downstream execution.

Others measure the orchestration layer itself.

Treating those numbers as interchangeable would be misleading.

I ended up thinking about the benchmarks as three different phases.

Redis Source
      │
      ▼
ChangeEvent Acquisition
      │
      ▼
HybridRuntime
      │
      ▼
Downstream Sink

Each phase has different bottlenecks.

Acquisition is constrained by Redis.

Processing is constrained by CPU, I/O latency, ordering requirements, and scheduling overhead.

Understanding which phase you're measuring is more important than the final throughput number.


The Synthetic Benchmark

The largest number observed was approximately 54,500 events per second.

That's intentionally not presented as an end-to-end Redis benchmark.

It measures the execution capacity of the orchestration layer after events have already been acquired.

In other words:

ChangeEvent
      │
      ▼
HybridRuntime
      │
      ▼
Processor

This benchmark answers a very specific question:

"How quickly can the runtime schedule and execute already-available work?"

That's useful.

It just isn't the same as measuring an entire Redis pipeline.


End-to-End Pipelines

Real systems spend time doing real work.

Reading from Redis.

Writing to PostgreSQL.

Calling HTTP services.

Updating search indexes.

Those operations introduce latency that no scheduler can eliminate.

When measured end-to-end, the results naturally become lower.

Current peak observations include:

  • Redis Streams → Runtime: approximately 17,600 events/sec
  • PostgreSQL WAL → Redis: approximately 20,000 events/sec

Those numbers include actual I/O rather than isolated scheduling.

Personally, I find them more interesting than the synthetic benchmark because they reflect complete pipelines.


Scaling Isn't Linear

One result immediately stood out.

Adding more Ractors did not produce proportional speedups.

That's exactly what I expected.

Parallelism always introduces coordination costs.

Events must be routed.

Partitions must remain consistent.

Workers communicate through Ractor mailboxes.

Ordering constraints occasionally delay otherwise-complete work.

The runtime spends part of its time doing useful work...

...and part of its time coordinating that work.

That coordination isn't overhead to eliminate.

It's the cost of preserving correctness.

The benchmark matrix made those tradeoffs visible.

Rather than chasing perfect scaling, the goal became identifying the point where additional parallelism stopped producing meaningful throughput gains.

For the current implementation, that sweet spot consistently appeared around:

  • 3 prewarmed Ractors
  • 5 Redis connections per Ractor
  • 50 Fibers

That balance delivered high throughput without introducing excessive scheduling overhead.


Ordering Has a Cost

One benchmark compared ordered and unordered execution.

The difference wasn't dramatic.

Ordered execution consistently performed slightly slower.

That's expected.

Maintaining ordering means the runtime occasionally waits for earlier work to complete before later work can safely continue.

Event 1
Event 2
Event 3

cannot become:

Event 2
Event 3
Event 1

simply because Event 2 happened to finish first.

Preserving correctness sometimes requires sacrificing a little throughput.

That's a tradeoff I consider worthwhile.

Correctness scales better than debugging race conditions.


The Interesting Bottleneck

The benchmark wasn't really about Redis.

It was about coordination.

At low parallelism, workers spend most of their time processing events.

At high parallelism, workers spend increasingly more time coordinating with one another.

Eventually another Ractor contributes more scheduling overhead than useful work.

Finding that point was considerably more valuable than finding the largest throughput number.

It answered a much more practical question:

"How should I actually configure this in production?"


Chaos Matters More Than Throughput

Raw throughput is only one characteristic of an event pipeline.

Recovery behavior is arguably more important.

The benchmark suite includes failure scenarios covering:

  • Redis restarts
  • PostgreSQL restarts
  • connection interruption
  • checkpoint recovery
  • consumer recovery

Streams resumed processing from checkpoints.

Pub/Sub sources reported explicit loss windows.

Recovery behavior remained consistent with each source's documented guarantees.

That consistency mattered more to me than achieving another few thousand events per second.


Long-Running Stability

Short benchmarks rarely expose operational problems.

Memory leaks.

Connection exhaustion.

Scheduler starvation.

Queue growth.

Those usually appear over time.

The runtime was therefore exercised continuously using soak tests.

One representative run processed approximately 1.34 million events over five minutes.

No processing failures were observed.

Median throughput degraded by roughly 2% over the duration of the run.

That's encouraging, although much longer overnight and multi-day soak tests remain on my roadmap.

Operational confidence comes from sustained behavior—not just impressive graphs.


What I Learned

Perhaps the most surprising outcome of the benchmarking work was this:

The execution runtime wasn't the limiting factor.

The limiting factor was almost always the surrounding system.

Network latency.

Redis.

HTTP endpoints.

Disk.

Database writes.

The scheduler spent most of its time waiting for external systems.

That reinforced one of the central architectural decisions behind HybridRuntime.

Fibers overlap waiting.

Ractors overlap computation.

Neither attempts to eliminate latency.

They simply ensure latency in one part of the system doesn't unnecessarily stall everything else.

The result isn't infinite scalability.

It's predictable scalability.

And for infrastructure software, predictability is usually the more valuable property.


The complete benchmark reports—including raw CSV data, SVG charts, chaos-recovery artifacts, and soak-test results—are published alongside the documentation.

I'd much rather readers inspect the raw data than rely on a single headline number.

Benchmarks are most useful when they're reproducible.

What's Next

cdc-redis-pro is only one piece of a much larger ecosystem.

The long-term goal was never to build "yet another Redis client."

The goal was to build a source-agnostic Change Data Capture platform for Ruby.

Today, PostgreSQL logical replication and Redis happen to be the two primary sources.

Tomorrow, that could just as easily include:

  • Kafka
  • NATS
  • Amazon SQS
  • Webhooks
  • Object storage
  • Search indexes
  • Other databases

The important observation is that the runtime doesn't need to change.

As long as a source can be normalized into a CDC::Core::ChangeEvent, everything downstream already knows how to process it.

That was the motivation behind separating source acquisition from execution.

        Source
           │
           ▼
   CDC::Core::ChangeEvent
           │
           ▼
    cdc-orchestrator-pro
           │
           ▼
        Processors

Every new source becomes an adapter.

Not a new runtime.


Why Split the Runtime?

One architectural decision deserves a brief explanation.

Originally the execution engine lived inside another project.

As the ecosystem evolved, I realized something important.

The runtime wasn't solving a Redis problem.

It wasn't solving a PostgreSQL problem.

It wasn't even solving a Sidekiq problem.

It was solving an event-processing problem.

That realization led to extracting the execution engine into its own commercial component:

cdc-orchestrator-pro

Today it powers Redis CDC.

Tomorrow it can power any source capable of producing normalized change events.

Separating those concerns keeps both halves of the system simpler.

Source adapters acquire events.

HybridRuntime processes them.

Each evolves independently.


Open Source First

Although cdc-redis-pro and cdc-orchestrator-pro are commercial products, the ecosystem they're built upon remains open source.

That includes:

  • cdc-core
  • cdc-parallel
  • cdc-concurrent
  • pgoutput-client
  • pgoutput-parser
  • pgoutput-decoder
  • Mammoth

Those projects define the common event model, execution primitives, and PostgreSQL integration that everything else builds upon.

The commercial components focus on operational capabilities rather than replacing the open-source foundation.

That separation is intentional.

I believe infrastructure ecosystems become valuable through adoption and trust—not artificial feature restrictions.


Looking Ahead

Redis replication remains one of the larger pieces still on the roadmap.

Today, cdc-redis-pro consumes Redis event sources such as Streams, Pub/Sub, and Keyspace Notifications.

A future version will move further upstream by treating Redis itself as a replication source.

That's a significantly more ambitious problem.

I'd rather stabilize the current architecture before expanding its scope.

There are also areas where I think the execution engine itself can continue to improve.

Adaptive scheduling.

Smarter partition routing.

Better observability.

Long-running soak tests.

More topology-aware execution.

Those improvements belong to the runtime rather than any particular source adapter—which is exactly why separating acquisition from execution turned out to be such a useful architectural boundary.


Final Thoughts

I started this project thinking I was building Redis CDC.

Somewhere along the way I realized I was really building an execution model.

Redis happened to expose the problem first.

PostgreSQL reinforced it.

Future source adapters will probably validate it again.

The most interesting lesson wasn't about Redis at all.

It was this:

Acquiring events and processing events are different problems.

They have different bottlenecks.

They scale differently.

They deserve different architectures.

Once those responsibilities are separated, the rest of the system becomes remarkably composable.

Redis becomes another source.

PostgreSQL becomes another source.

Tomorrow's adapters become just that—adapters.

The runtime stays the same.

For me, that's the most exciting part of the entire project.

Not because it produced the largest benchmark numbers.

Not because it uses Ractors or Fibers.

But because it led to an architecture that's easier to reason about, easier to extend, and honest about the tradeoffs of the systems it builds upon.

The benchmark reports are public.

The documentation is public.

The implementation is commercial.

If you're building event-driven systems in Ruby—or you're wrestling with Redis and PostgreSQL in the same architecture—I'd genuinely love to hear how you're approaching those problems.

I'm convinced there's still a lot left to explore.