惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Help Net Security
Help Net Security
V
V2EX
博客园 - 叶小钗
博客园 - 司徒正美
云风的 BLOG
云风的 BLOG
F
Full Disclosure
博客园 - 聂微东
宝玉的分享
宝玉的分享
有赞技术团队
有赞技术团队
U
Unit 42
Jina AI
Jina AI
Engineering at Meta
Engineering at Meta
H
Help Net Security
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
P
Proofpoint News Feed
Last Week in AI
Last Week in AI
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
C
Check Point Blog
阮一峰的网络日志
阮一峰的网络日志
B
Blog RSS Feed
Recent Announcements
Recent Announcements
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Martin Fowler
Martin Fowler
Apple Machine Learning Research
Apple Machine Learning Research
F
Fortinet All Blogs
月光博客
月光博客
Microsoft Security Blog
Microsoft Security Blog
The Cloudflare Blog
爱范儿
爱范儿
J
Java Code Geeks
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
大猫的无限游戏
大猫的无限游戏
博客园 - 三生石上(FineUI控件)
GbyAI
GbyAI
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
酷 壳 – CoolShell
酷 壳 – CoolShell
V
Visual Studio Blog
B
Blog
D
DataBreaches.Net
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
雷峰网
雷峰网
T
The Blog of Author Tim Ferriss
S
SegmentFault 最新的问题
A
About on SuperTechFans
Cloudbric
Cloudbric
人人都是产品经理
人人都是产品经理
S
Schneier on Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
P
Privacy International News Feed
Know Your Adversary
Know Your Adversary

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You
Redis Streams for Reliable Message Processing: Consumer Groups, Dead Letters, and Backpressure
The Practica · 2026-06-16 · via The Practical Developer

The notification emails stopped sending at 02:14. Not a crash. The Redis Pub/Sub channel still showed one subscriber. But the subscriber process had silently restarted during a deployment, and Pub/Sub delivered exactly zero messages to the connection that was briefly missing. By the time the team noticed at 08:45, 4,200 onboarding emails had evaporated.

This is not a Redis bug. This is Redis Pub/Sub working exactly as designed. PUBLISH sends a message to every currently connected subscriber. If nobody is listening, the message is gone forever. No queue. No persistence. No replay. This makes Pub/Sub fine for ephemeral notifications (live dashboards, chat typing indicators) and catastrophically wrong for anything you cannot afford to lose.

Redis Streams, introduced in Redis 5.0, fixes this. It is an append-only log with consumer groups, acknowledgment tracking, and configurable retention. It gives you the durability of a message queue without running Kafka, and it fits in a Node.js app with about 150 lines of careful code.

This post covers the three patterns that turn Redis Streams from a “fancy list” into a production-grade message processor: consumer groups that survive crashes, dead-letter handling for poison messages, and backpressure that keeps your Redis memory from filling up when your workers slow down.

Consumer groups: the thing Pub/Sub is missing

A consumer group is a logical subscription to a stream that tracks which messages each consumer has processed. When a consumer crashes before acknowledging a message, the group knows. When a new consumer joins, the group assigns it pending messages that other consumers abandoned.

Here is the minimal producer:

import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

async function enqueue(email) {
  await redis.xAdd('notifications:email', '*', {
    to: email.to,
    subject: email.subject,
    body: email.body,
    created_at: Date.now().toString(),
  });
}

The * tells Redis to auto-generate the stream ID (a millisecond timestamp plus a sequence number). You can supply your own ID if you want causality, but * is the right default.

And the consumer that will not lose messages:

const GROUP = 'email-sender-group';
const CONSUMER = `worker-${process.pid}`;

async function startConsumer() {
  const redis = createClient({ url: process.env.REDIS_URL });
  await redis.connect();

  // Create the group if it does not exist.
  // MKSTREAM creates the stream automatically.
  try {
    await redis.xGroupCreate('notifications:email', GROUP, '0', {
      MKSTREAM: true,
    });
  } catch (err) {
    // BUSYGROUP means the group already exists. That is fine.
    if (!err.message.includes('BUSYGROUP')) throw err;
  }

  console.log(`Consumer ${CONSUMER} listening...`);

  while (true) {
    try {
      // XREADGROUP reads messages assigned to this consumer.
      // Block for up to 5 seconds if no messages are available.
      const results = await redis.xReadGroup(
        GROUP,
        CONSUMER,
        [{ key: 'notifications:email', id: '>' }],
        { COUNT: 10, BLOCK: 5000 }
      );

      if (!results || results.length === 0) continue;

      for (const { messages } of results) {
        for (const msg of messages) {
          await processMessage(msg);
          // Acknowledge after successful processing.
          await redis.xAck('notifications:email', GROUP, msg.id);
        }
      }
    } catch (err) {
      console.error(`Consumer error:`, err);
      // Wait before retrying to avoid tight error loops.
      await sleep(1000);
    }
  }
}

The critical detail is id: '>'. The > character is a special marker that tells Redis “give me messages that have never been delivered to this consumer.” Without >, you are reading by ID range, which is how you reprocess dead letters (more on that later).

The xAck call is the commit. Until you acknowledge a message, Redis considers it pending and will reassign it to another consumer if your worker disconnects or if the pending time exceeds a threshold. This is the at-least-once guarantee: every message gets processed by at least one consumer, even if the original consumer dies halfway through.

The pending entries list: your crash recovery mechanism

Here is where the pattern pays for itself. Run this query on a stream that has been processing for a while:

redis-cli XPENDING notifications:email email-sender-group

You will see output like this:

1) (integer) 12       # total pending messages
2) "1718234512000-0" # oldest pending ID
3) "1718234522000-0" # newest pending ID
4) 1) "worker-1234"
   2) "worker-5678"

Each pending message has a consumer name, a delivery count, and an idle time. If worker-1234 has 11 pending messages with idle times over 30 seconds, that worker is probably dead. Redis does not auto-reclaim these messages. You must claim them explicitly or configure a consumer to do it.

Here is the recovery loop you run alongside your main consumer:

async function claimStaleMessages() {
  const redis = createClient({ url: process.env.REDIS_URL });
  await redis.connect();

  const MAX_PENDING = 100;
  const MIN_IDLE_MS = 30_000; // 30 seconds without progress = dead

  while (true) {
    await sleep(10_000); // Check every 10 seconds

    try {
      // Get pending messages for the group.
      // XPENDING returns summary-level data.
      // XAUTOCLAIM does the actual re-assignment.
      const claimed = await redis.xAutoClaim(
        'notifications:email',
        GROUP,
        CONSUMER,
        MIN_IDLE_MS,
        '0-0',
        { COUNT: MAX_PENDING }
      );

      if (claimed.messages.length > 0) {
        console.log(`Claimed ${claimed.messages.length} stale messages`);
      }
    } catch (err) {
      console.error(`Claim error:`, err);
    }
  }
}

XAUTOCLAIM (added in Redis 6.2) replaces the older multi-step pattern of XPENDING + XCLAIM. It scans the pending entries list, finds messages that have been idle longer than MIN_IDLE_MS, and reassigns them to the current consumer. The '0-0' is the start cursor. The command returns a cursor for the next batch, so you can paginate through millions of pending entries.

This is the safety net. Without it, a crashed consumer takes its unacknowledged messages to the grave. The stream retains them forever (assuming you have not set a trimming policy), but no consumer will touch them unless something explicitly claims them.

Dead letters: what happens when a message is poison

Not every message should be retried. If a message has a malformed payload, references a user ID that does not exist, or causes your email provider to return a 400, re-processing it three times will produce three identical failures. These are poison messages, and they need a dead-letter queue (DLQ).

Here is the pattern that moves bad messages out of the main stream:

const MAX_DELIVERIES = 3;

async function processMessage(msg) {
  const payload = JSON.parse(msg.message);

  // Check delivery count from stream metadata.
  // XREADGROUP response includes a delivery count in the message ID sequence.
  // For XPENDING-style tracking, use XINFO or the pending entry details.
  const deliveryCount = await getDeliveryCount(msg.id);

  if (deliveryCount > MAX_DELIVERIES) {
    await deadLetter(msg, 'Max delivery count exceeded');
    // Acknowledge the message to remove it from pending.
    await redis.xAck('notifications:email', GROUP, msg.id);
    return;
  }

  try {
    await sendEmail(payload);
  } catch (err) {
    if (isPermanentFailure(err)) {
      // No point retrying a schema violation or auth error.
      await deadLetter(msg, err.message);
      await redis.xAck('notifications:email', GROUP, msg.id);
    } else {
      // Transient failure. Do not ack. Let the claim loop retry.
      throw err;
    }
  }
}

async function deadLetter(msg, reason) {
  await redis.xAdd('notifications:email:dlq', '*', {
    original_id: msg.id,
    payload: JSON.stringify(msg.message),
    reason,
    failed_at: Date.now().toString(),
  });
}

function isPermanentFailure(err) {
  // Specific errors that no retry will fix.
  const permanent = ['INVALID_PAYLOAD', 'USER_NOT_FOUND', 'RATE_LIMITED'];
  return permanent.some((code) => err.message.includes(code));
}

The dead-letter stream is a separate stream with the same structure. You monitor it, alert on it, and reprocess messages manually after fixing whatever caused the failure. The alternative (leaving poison messages in the pending list forever) bloats the pending entries list and slows down XAUTOCLAIM scans.

A note on the delivery count: Redis does not natively increment a counter on each delivery. You have two options:

  1. Track it via XPENDING details. Each pending entry includes a delivery counter. XAUTOCLAIM returns the delivery count with each claimed message.
  2. Embed a counter in the message body. Add deliveries: 0 when you produce, and increment it in the consumer before re-enqueuing.

Option 1 is cleaner because it uses Redis’s built-in tracking. But it only counts deliveries within the pending list, meaning messages that were acknowledged and re-added to the stream start at 1 again. For most use cases this is acceptable. If you need an absolute count across all delivery attempts, embed it in the payload.

Backpressure: the silent Redis memory killer

The stream grows unboundedly unless you trim it. A busy stream processing 1,000 messages per second with 1KB payloads consumes about 84GB per day. Without trimming, Redis runs out of memory, evicts other keys, or crashes.

Redis Streams offers three trimming strategies, and you need to pick the right one before the pager goes off at 3am.

MAXLEN ~ (approximate trimming)

// Keep approximately 100,000 messages. The ~ makes it efficient.
await redis.xTrim('notifications:email', 'MAXLEN', '~', 100_000);

The tilde is critical. Without it (MAXLEN 100000), Redis truncates the stream to exactly 100,000 messages every time you trim. With ~, Redis trims to roughly 100,000 messages but only when it can evict a full macro-node (a Redis internal data structure). In practice this makes trimming a constant-time operation instead of O(n).

Set this in a periodic job or at the end of each consumer loop iteration. Do not do it on every produce call, because XTRIM is not free even with ~.

MINID (trim by message age)

// Remove messages older than 7 days.
const SEVEN_DAYS_MS = 7 * 24 * 60 * 60 * 1000;
const cutoff = Date.now() - SEVEN_DAYS_MS;

await redis.xTrim('notifications:email', 'MINID', '~', cutoff);

MINID removes messages with IDs older than the specified timestamp. This is the right strategy when you care about retention duration (e.g., “keep 7 days of messages for reprocessing”) instead of count. Use ~ here too for the same performance reason.

Consumer-group-aware trimming

There is a subtle bug with naive trimming: you can trim messages that consumers have not processed yet. If a consumer is 10,000 messages behind and you trim to MAXLEN ~ 5000, those 10,000 messages disappear before the consumer reads them.

Check the consumer group lag before trimming:

async function safeTrim(stream, group, maxLen) {
  const info = await redis.xInfoGroups(stream);
  const groupInfo = info.find((g) => g.name === group);
  const lag = groupInfo['lag']; // Redis 7.0+ field

  // Only trim if the consumer is within 20% of the stream head.
  // lag can be null for older Redis versions.
  if (lag !== null && lag > maxLen * 0.2) {
    console.warn(`Consumer lag is ${lag}. Skipping trim.`);
    return;
  }

  await redis.xTrim(stream, 'MAXLEN', '~', maxLen);
}

In Redis 7.0+, XINFO GROUPS returns a lag field that tells you how many messages the slowest consumer is behind. Use it. Without consumer-group-aware trimming, you will randomly lose messages that consumers have not processed, and the symptoms (missing data with no errors) are the hardest kind to debug.

Putting it together: a production consumer skeleton

Here is the complete consumer pattern that combines all three mechanisms:

import { createClient } from 'redis';

const STREAM = 'notifications:email';
const GROUP = 'email-sender-group';
const CONSUMER = `worker-${process.pid}`;
const MAX_PENDING_READ = 100; // messages per batch

let shuttingDown = false;

async function main() {
  const redis = createClient({ url: process.env.REDIS_URL });
  await redis.connect();

  await ensureGroup(redis);

  // Start claim loop in background.
  const claimHandle = startClaimLoop(redis);

  // Trim loop in background.
  const trimHandle = startTrimLoop(redis);

  process.on('SIGTERM', async () => {
    shuttingDown = true;
    clearInterval(claimHandle);
    clearInterval(trimHandle);
    await redis.quit();
    process.exit(0);
  });

  while (!shuttingDown) {
    try {
      const results = await redis.xReadGroup(
        GROUP,
        CONSUMER,
        [{ key: STREAM, id: '>' }],
        { COUNT: MAX_PENDING_READ, BLOCK: 2000 }
      );

      if (!results) continue;

      for (const { messages } of results) {
        for (const msg of messages) {
          try {
            await handleMessage(msg);
            await redis.xAck(STREAM, GROUP, msg.id);
          } catch (err) {
            if (isPermanent(err)) {
              await deadLetter(redis, msg, err.message);
              await redis.xAck(STREAM, GROUP, msg.id);
            }
            // Transient: do not ack. Claim loop will retry.
          }
        }
      }
    } catch (err) {
      console.error('Read loop error:', err);
      await sleep(2000);
    }
  }
}

async function ensureGroup(redis) {
  try {
    await redis.xGroupCreate(STREAM, GROUP, '0', { MKSTREAM: true });
  } catch (err) {
    if (!err.message.includes('BUSYGROUP')) throw err;
  }
}

function startClaimLoop(redis) {
  return setInterval(async () => {
    try {
      const result = await redis.xAutoClaim(
        STREAM, GROUP, CONSUMER, 30_000, '0-0',
        { COUNT: 100 }
      );
      if (result.messages.length > 0) {
        console.log(`Claimed ${result.messages.length} stale messages`);
      }
    } catch (err) {
      console.error('Claim loop error:', err);
    }
  }, 10_000);
}

function startTrimLoop(redis) {
  return setInterval(async () => {
    try {
      const info = await redis.xInfoGroups(STREAM);
      const groupInfo = info.find((g) => g.name === GROUP);
      const lag = groupInfo?.lag ?? Infinity;

      if (lag < 1000) {
        await redis.xTrim(STREAM, 'MAXLEN', '~', 50_000);
      }
    } catch (err) {
      console.error('Trim loop error:', err);
    }
  }, 60_000);
}

// -- helpers --

async function handleMessage(msg) {
  const data = JSON.parse(msg.message);
  // Your actual processing logic here.
  // Throws on transient errors, returns on success.
}

function isPermanent(err) {
  return ['INVALID', 'NOT_FOUND'].some((c) => err.message.includes(c));
}

async function deadLetter(redis, msg, reason) {
  await redis.xAdd(`${STREAM}:dlq`, '*', {
    original_id: msg.id,
    payload: msg.message,
    reason,
    failed_at: Date.now().toString(),
  });
}

function sleep(ms) {
  return new Promise((r) => setTimeout(r, ms));
}

main().catch(console.error);

This is about 110 lines and covers everything you need: at-least-once delivery, dead-letter isolation, crash recovery via auto-claim, and memory safety via consumer-group-aware trimming.

When to use Redis Streams (and when not to)

Redis Streams is a great fit when:

  • You already run Redis and want to avoid adding Kafka or RabbitMQ to the stack.
  • Your throughput is under 10,000 messages/second per stream (Redis is single-threaded for stream operations).
  • You need at-least-once delivery but can tolerate rare duplicates (use idempotent consumers).
  • Message retention is measured in hours or days, not weeks.

It is a poor fit when:

  • You need exactly-once semantics. Redis Streams gives you at-least-once. Deduplication must happen in the consumer.
  • Your messages are large (over 10KB). Streams store data in memory; large payloads waste Redis RAM.
  • You need strict ordering across multiple streams. Each stream is ordered independently.
  • You need message TTL by key value (not by count or age in the stream). There is no per-message TTL.

The most common mistake teams make is using Redis Streams for everything because Redis is already in the stack. If your messaging needs include long-term storage, schema evolution, or replay from arbitrary points in time, Kafka’s log-compacted topics and schema registry handle those patterns natively. Redis Streams is a queue, not an event store. Treat it as one.

Takeaway

Redis Pub/Sub drops messages on disconnect. Redis Streams with consumer groups does not. The three-pattern combination of XREADGROUP + XAUTOCLAIM + MAXLEN ~ trimming gives you a message processing pipeline that survives process crashes, handles poison messages gracefully, and stays within its memory budget. The total implementation is about 150 lines of Node.js. The alternative (running a separate message broker) is a whole new deployment.

Every production stream needs all three patterns. Consumer groups without auto-claim leaks pending entries. Consumer groups without dead letters retries poison messages into infinity. Consumer groups without trimming eats all your Redis memory. Implement the trio or do not bother with streams at all.

A note from Yojji

Building message processing infrastructure that survives process crashes, handles poison messages, and stays within its memory budget is the kind of reliability engineering that separates systems that wake someone up at 3am from systems that quietly self-heal. Yojji’s senior engineering teams design and implement these patterns regularly across the JavaScript ecosystem, building production data pipelines on Redis, Postgres, and cloud infrastructure for clients in e-commerce, fintech, and logistics.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their teams specialize in the full cycle of product delivery from discovery through deployment, with deep expertise in Node.js, TypeScript, React, cloud platforms on AWS, Azure, and Google Cloud, and the production-hardened backend patterns that keep data flowing reliably even when things go wrong.