惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Security Blog
Microsoft Security Blog
Google DeepMind News
Google DeepMind News
P
Privacy International News Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
T
Threatpost
GbyAI
GbyAI
V
Visual Studio Blog
H
Help Net Security
Vercel News
Vercel News
P
Palo Alto Networks Blog
Project Zero
Project Zero
AWS News Blog
AWS News Blog
Latest news
Latest news
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
The Register - Security
The Register - Security
博客园_首页
WordPress大学
WordPress大学
G
GRAHAM CLULEY
T
Tor Project blog
有赞技术团队
有赞技术团队
Know Your Adversary
Know Your Adversary
AI
AI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
O
OpenAI News
博客园 - 聂微东
月光博客
月光博客
S
Security Affairs
Webroot Blog
Webroot Blog
L
LangChain Blog
Apple Machine Learning Research
Apple Machine Learning Research
NISL@THU
NISL@THU
N
News and Events Feed by Topic
Blog — PlanetScale
Blog — PlanetScale
S
Securelist
V
Vulnerabilities – Threatpost
aimingoo的专栏
aimingoo的专栏
阮一峰的网络日志
阮一峰的网络日志
Stack Overflow Blog
Stack Overflow Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
D
DataBreaches.Net
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Y
Y Combinator Blog
Cisco Talos Blog
Cisco Talos Blog
The Cloudflare Blog
IT之家
IT之家
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
L
Lohrmann on Cybersecurity
T
The Blog of Author Tim Ferriss

Redis

Real-Time Fraud Detection: Latency, Features & Scale Context window in AI: why every token is a budget decision Connecting to Redis Cloud with AWS PrivateLink vs. VPC peering | Redis Redis Data Integration in Redis Cloud is now GA in AWS | Redis Why AI Misses Business Context & How Teams Fix It AI Reasoning Explained: Why Context Matters Semantic Layer vs Context Layer: Key Differences Redis array data type: How it works and when to use it Context Graphs vs. Vector Search: When RAG Falls Short What’s new in two – May 2026 edition Redis 8.8 performance improvements: Faster string, hash, streams, SCAN & more Redis 8.8: New array data structure & open source features How Conflict-free Replicated Data Types power active-active database replication Context Orchestration: What It Is & How It Works Context Compaction for AI Agents: A Complete Guide Prompt Bloat: Causes, Costs & Fixes for LLM Apps Agentic Retrieval Techniques: A Complete Guide Single-shot reliable consumers with XREADGROUP CLAIM in Redis 8.4 | Redis Long-Horizon AI Agents: Memory & State Infrastructure What is a context engine? What Is a Context Layer? AI Agent Infrastructure Context Retrieval for AI Agents: What It Is & Why It Matters Context Poisoning: How Bad Data Breaks Agent Reasoning Context is all you need: Introducing Redis Iris | Redis Context Engineering for AI: What It Is & How to Build It Dynamic endpoints: Migrate databases without changing your endpoint | Redis AI Shopping Assistants: How They Work & What to Build Endless Aisle Retail: Infrastructure & Real-Time Data LLM Speed Benchmarks: Metrics & Infrastructure Guide Context Pruning: Cut LLM Tokens Without Losing Quality What’s new in two – April 2026 edition Agentic AI Architecture: 5 Patterns Explained AI Agent vs Chatbot: Key Differences Explained Advantages of Building a Vector Search Solution API Latency in LLM Apps: Causes & How to Fix It Security advisory: [CVE‑2026‑23479] [CVE‑2026‑25243] [CVE-2026-25588] [CVE‑2026‑25589] [CVE-2026-23631] | Redis Edge Computing Latency: Causes & How to Reduce It AI Agents vs Workflows: When to Use Each Streaming LLM Responses: Make Your AI App Feel Fast Active-Active vs Active-Passive Database Architecture Prefill vs Decode: LLM Inference Phases Explained Long-Term Memory Architectures for AI Agents Time to First Byte Test: Tools, Causes & Fixes Speculative decoding: how it works & when to use it P95 Latency: What It Is & Why It Matters Why Multi-Agent LLM Systems Fail & How to Fix Them AI Human in the Loop: Production Oversight Patterns Native OpenTelemetry metrics for Redis client libraries | Redis Client-side geographic failover for Redis Active-Active | Redis Use Redis with SQL | Redis Introducing Redis Feature Form Build Google ADK Agents with persistent, real-time memory on Redis | Redis Startup Spotlight: Neuron Systems API Throttling: Algorithms, Patterns & Mistakes Agentic AI Examples Across 6 Industries Best Chunking Strategies for RAG Pipelines Agentic AI Guardrails: Controls That Work Redis joins AWS at GDC to support the next generation of gaming | Redis Designing a semantic routing system: From static rules to dynamic intelligence with Redis and Java | Redis Real-Time Dispatch System: A Complete Guide P99 Latency: What It Means & How to Fix It Tokenization in LLMs: What AI App Devs Need to Know TTFT Meaning: What is Time to First Token? Atomic slot migration with Redis 8.4 Hybrid search benefits: Why your RAG system needs both keyword & vector search What’s new in two: March 2026 edition Vector embedding generators: How they work & how to use them Throughput-optimizing Redis for L2 KV Cache Reuse What is a data pipeline? Building AI agent pipelines that don't forget, fail, or fall apart Redis achieves Google Cloud Ready, Distributed Cloud status ahead of Google Cloud Next ‘26 | Redis Real-time network monitoring: what your data platform needs to keep up AI agent API: How agents connect to the real world What is multicloud infrastructure? A guide for 2026 What is a transaction monitoring system & how does it work? Why your AI agent fails in production & how tracing helps AI agent benchmarks: Where they fall short & why your infrastructure matters What is a JSON database (and when should you use one)? Introducing the Redis Partner Network: A new foundation for real-time innovation How real-time customer segmentation works in retail Payment orchestration & vault architecture in retail Agentic systems vs. GenAI: when generation isn't enough What is fuzzy matching? Semantic caching & routing: two powerful patterns for vector classification Redis alternatives: Why there are no exact substitutes Connect to Azure Managed Redis with Redis Insight 3.2.0 Redis to Manage Storage Replication | Redis How hierarchical navigable small world (HNSW) algorithms can improve search | Redis How leading financial institutions use Redis to drive growth | Redis What’s new in two: May 2025 | Redis Introducing Model Context Protocol (MCP) for Redis | Redis Redis vs. Elasticsearch: What’s faster for GenAI & vector search? | Redis Build fast, production-worthy AI apps with Spring AI and Redis | Redis Azure Managed Redis is GA today | Redis Redis then & now: Adapting with developers through every era | Redis Supercharge Your AI with OpenShift AI and Redis: Unleash speed and scalability | Redis What’s new in two: April 2025 | Redis Redis 8 is now GA, loaded with new features and more than 30 performance improvements | Redis What is a data strategy? 6 key components explained Data replication explained: types, examples & use cases
How to tame the thundering herd problem
Redis · 2026-02-18 · via Redis

The thundering herd problem occurs when multiple processes or clients repeatedly request the same resource simultaneously, leading to excessive load and performance degradation.

If you grew up on classic comedy, recall the scene when the Three Stooges would get stuck entering the same doorway. If you’ve ever been to a large concert, remember how difficult it was for everyone to exit at the same time.

The same pattern occurs in online systems. That same concert might have caused a thundering herd problem well before the concert started if the ticketing website had crashed after the concert was announced. Too much demand through too “narrow” a resource or process can cause severe issues.

In modern web applications, especially microservices and distributed systems, this pattern is common during traffic spikes or coordinated events. A simple disruption – say, a cache entry expiring or a brief outage – can trigger a cache stampede where every client hits the backend at once, degrading performance or even bringing services down.

Fortunately, developers can tame this “herd” by using smart architectural practices. You can use Redis for a range of solutions to this, including caching, rate limiting, and queuing mechanisms, that can prevent stampedes and keep systems running smoothly even during sudden load spikes.

What Is the Thundering Herd Problem and Why Does It Happen?

The thundering herd problem occurs when many clients or threads concurrently attempt to access the same resource, especially after it becomes unavailable or expires. Only one of those requests can be served at a time, so the rest pile up and repeatedly hammer the backend resource.

Retries often make this worse — especially if clients retry on the same schedule. Without jittered backoff, thousands of clients can synchronize again, creating repeated bursts of load.

As a result, the database, API, or service gets flooded with redundant work, leading to high latency or failures until the “herd” of requests dissipates or is otherwise handled. In caching systems, this often occurs when a popular cache entry expires. If thousands of users were relying on that cached data, they would all fall back to fetching from the database, effectively overloading the database with simultaneous queries.

Common Causes of the Thundering Herd Problem

Several common scenarios can trigger a thundering herd in high-traffic environments:

  • Cache expiration or failure: If a heavily used cache key (or, worse, many keys) expires at roughly the same time, an avalanche of cache misses occurs. All those misses lead to a wave of concurrent database reads as each process tries to regenerate the data. Similarly, if a cache server goes down (cache failure or flush), it can cause a thundering herd of misses.
  • High-volume traffic spikes: Sudden surges in user activity can create herd-like behavior. If many users request the same record or page simultaneously, they might all hit the database if the cache isn’t ready.
  • Database lock contention: If many transactions try to update the same record or row simultaneously in a database, they can form a thundering herd on a lock. The database has to serialize updates, and a backlog of waiting transactions can pile up, consuming resources. During that contention, read queries might also stall.
  • Auto-scaling cold starts: When infrastructure scales out under load, new instances often start with empty caches. If ten new servers all spin up and request the same popular data at once, they can overwhelm the backend — a self-inflicted herd.

Across these causes, the solution isn’t just adding more servers. It’s designing at the architectural level to stagger requests, coordinate cache refreshes, and distribute load.

Real-world examples

In a real-world context, you can start to see just how frequent the thundering herd problem can be if you’re not prepared. Consider:

  • Large-scale login surges: If a streaming service drops a new episode of a hit show at a scheduled time, millions of users might log in or refresh the app that very minute. If the user profile data or homepage feed was cached and those caches expire or become invalid at the same time, the surge of viewers can all hammer the database.
  • API rate limits being exceeded: Public APIs sometimes experience a herd effect when a popular app or integration causes a surge in synchronous usage. For example, if many clients use a third-party service on a schedule, they might all hit an external API at the same time. The API might then rate-limit or throttle all those requests, but those clients may all retry again shortly, again in unison. This pattern of synchronized retries can make it hard for the API to recover without staggering.
  • IoT devices making repeated requests: Imagine millions of IoT sensors that check in with a central service every hour. At the top of each hour, the service gets a flood of simultaneous requests from all the devices. If that service depends on a cache or database for configuration data, it might experience a stampede.
  • Viral social media posts: A trending post can create a two-pronged herd problem — on the read side, a cache stampede when cached post data expires, and on the write side, lock contention when thousands of reactions or comments update the same rows simultaneously.

Understanding Redis’s role before solving the problem

Before we jump into strategies that can address the thundering herd problem, it’s important to understand how Redis fits into this picture. Developers often work with Redis on an open source basis and use it to mitigate database load. This can work, but if Redis isn’t configured thoughtfully, Redis can, ironically and inadvertently, contribute to thundering herd issues.

The most common configuration issues include:

  • Synchronized cache expiration, where many keys expire at once and trigger a surge of backend requests.
  • Passive expiration, which is when keys are removed only when accessed, allowing spikes to concentrate around those moments.
  • Lack of default TTL randomization, which increases the likelihood of cache stampedes during high concurrency.

In a similar manner, distributed systems, in general, can exacerbate the thundering herd problem. When a large number of new Redis clients or app instances spin up (e.g., via autoscaling during a traffic spike), they may all issue cache misses at once. In a situation without protective measures, this results in load amplification instead of load buffering.

Redis itself is not the root cause of thundering herd problems. The core issue is in how the caching strategy is configured and how the application handles concurrency. Redis actually provides the tools to prevent stampedes, but a cache can magnify traffic spikes rather than mitigate them.

The impact of the thundering herd problem on system performance

When a thundering herd event hits, the effects on your system are usually painful and noticeable. Unfortunately, this isn’t a pain that can just be absorbed by your system either; in many cases, users will immediately notice and feel the pain, too.

Latency and load spikes

As dozens or hundreds of requests queue up, users start waiting longer and longer for responses. In a stampede, many requests might time out or spend seconds in a queue.

The database CPU and I/O will spike to cope with the sudden workload, and application threads might max out waiting on slow data fetches. This often creates cascading failures: as threads wait, request throughput drops, queues back up, and even requests for uncached data slow down because the infrastructure is busy dealing with the stampede.

Infrastructure costs

A thundering herd is also very costly in terms of infrastructure utilization. CPU and memory usage will spike unpredictably during these events. You might see, for example, database servers hitting 100% CPU usage or connection pools getting exhausted for short periods.

If you’re on a system with cloud-based auto-scaling, the system might try to scale out to handle load balancing, but often, by the time new instances are ready, the spike has subsided (or worse, the scale-out itself adds to the spike).

Sometimes, teams will over-provision their databases and caches “just in case” to handle stampedes, which means higher cloud costs for capacity that is idle most of the time. Conversely, , if you under-provision and rely on auto-scaling, you risk the scale-up not reacting fast enough.

User experience issues

For end-users, the thundering herd problem manifests as sluggish responses, errors, or outright outages. If your system is overwhelmed, users will experience timeouts, very long wait times, or operations failing.

In ecommerce or financial systems, this directly translates to lost revenue (e.g., shopping cart checkouts failing during a sale). In less critical applications, it still erodes user trust. A viral moment turning into a site crash is a missed opportunity and a bad look for reliability.

In severe cases, a stampede can cascade into a full system crash. The overloaded database might run out of memory or connections and restart, taking your app completely offline until a manual fix. Even once the initial herd subsides, recovering from such an event can be slow if caches remain empty or if upstream services are dealing with backlogs. Repeated incidents will force users to find alternatives. If you have SLAs (Service Level Agreements) in place, a single thundering herd could blow your latency and uptime targets for the month, possibly incurring penalty clauses.

Common ways to configure the cache to help solve the thundering herd problem

Solving the thundering herd problem largely involves making your caching layer smarter so that it doesn’t fail in a way that stampedes your backend. There are some common techniques that, when combined with the right tooling, can make thundering herd problems much less likely.

Efficient caching with expiry jitter

One of the simplest and most effective measures is introducing a jitter to cache expiration times. Instead of having many keys expire at a fixed interval, add a little randomness to each key’s TTL.

For example, if you want a roughly 1-hour expiry, you might actually set a random TTL between 55 and 65 minutes for each item. This staggered expiration ensures that cached items don’t all vanish simultaneously. By distributing expirations over time, you avoid the scenario where a whole herd of requests hits the database at one minute past the hour.

Request coalescing

Request coalescing is about ensuring that when a cache miss happens, you don’t unleash a dozen duplicate backend fetches for the same data. The basic process involves only allowing one request to fetch the data from the database, while the others wait for that result.

Once the data is fetched and the cache is filled, all the waiting requests can use the fresh cache entry. One way to implement this idea is by using a distributed lock. Redis, for example, offers distributed locks, which can reduce the likelihood of overusing shared resources.

Rate limiting

If your system experiences bursts of requests that threaten to overload it, implementing rate limiting or backpressure can protect it from collapse. Rate limiting doesn’t directly solve a cache stampede, but it helps throttle the overall influx of requests during extreme spikes.

This can be especially useful if you have portions of traffic that can be identified and delayed (for example, web crawlers or lower-priority batch jobs).

Load shedding and queue-based processing

Load shedding occurs when you drop or defer work when the system is under duress. If you can identify requests that are safe to drop or delay, doing so during a herd scenario can save your system. For example, if your web service is overwhelmed, you might choose to drop non-critical background requests or analytics pings to free capacity for real user actions.

A more controlled method, however, is to use queueing. Instead of hitting the database immediately, requests are put into a queue or Redis Streams for processing. A separate service pulls from the queue at a rate the database can handle. This smooths out bursts. Users might wait slightly longer for results, but it’s better than the entire system melting down.

Solving the thundering herd problem for enterprise

Enterprise systems often operate at a much larger scale than other systems. Parallel to that scale is criticality: For many enterprises and their clients, even a brief thundering herd incident is unacceptable.

Solving enterprise-scale thundering herd problems requires understanding enterprise-specific issues. Consider, for example:

  • High-frequency transactions and revenue impact: In financial services, ecommerce, or ticketing, every transaction might be worth money. A minor delay or outage can directly translate to lost revenue.
  • Massive user concurrency: Enterprises frequently have user bases in the millions. Think of a global retail app during Black Friday, or a major bank’s mobile app at peak hours. At this scale, even small inefficiencies can become major problems. And thundering herds can cause a domino effect across microservices.
  • Strict SLAs and compliance requirements: Enterprises often commit to four or five nines of availability (99.99% or 99.999% uptime). In real terms, 99.99% uptime allows about 52 minutes of downtime per year, and 99.999% allows about 5 minutes per year. A single thundering herd event that crashes your service could consume a year’s error budget.

In an enterprise context, it’s also worth remembering that a poorly configured cache can become a single point of failure itself. Earlier, we showed how synchronized expirations or failovers can cause issues. At enterprise scales, these problems might spike loads by 100x for a brief moment (not just 2x or 3x).

A caching layer must be architected with high availability and herd prevention in mind. Redis Enterprise’s Active-Active geo-distribution, for example, lets you have multiple primary caches in different regions.

Caching needs to be planned carefully because enterprises, even more so than other businesses, need real-time performance in order to offer ultra-low latency, high availability, fault tolerance, and scalable, cost-effective strategies that optimize resource utilization.

When improperly configured, caches – Redis-based and beyond – can become a source of the thundering herd problem (especially during mass cache expirations or failovers). But Redis comes with built-in tools and patterns to mitigate this risk.

How to mitigate the thundering herd problem while using Redis

To ensure Redis works for you and helps you prevent stampedes (and not against you), consider implementing the following patterns.

In-memory caching: Store frequently accessed data to prevent repeated database hits. Make sure you are caching the right data and have an appropriate eviction policy. A high cache hit rate means far fewer queries reaching your database, which automatically mitigates herd effects. If the herd can’t reach the database because the cache handles it, you’re safe.

Bloom filters: A Bloom filter is a probabilistic data structure that can quickly test whether an item is not in a set. In caching, Bloom filters help with cache penetration scenarios (i.e., when clients request lots of items that don’t exist in the database). By keeping a Bloom filter of all known keys in Redis, you can check that first and potentially skip even hitting the cache or database.

Rate limiting: Redis provides simple and effective ways to implement rate limiting. For example, ensure no single client or API user can send more than X requests per second to prevent one consumer from causing a herd-like effect. Additionally, you can put a cap on global request rates to your critical sections, and with Redis, you can maintain counters per user IP and per API key – including expiration limits to reset the counts each window.

Choosing a solution for reliable caching: Redis vs. Amazon ElastiCache and Google Memorystore

Your choice of caching technology directly affects your ability to implement these protections.

Redis is available as open source software and Redis Software for enterprise-grade deployments. For cloud options, Redis Cloud is available on AWS, GCP, Heroku and Vercel, and Azure Managed Redis is available on Azure. Another option is Valkey, an open source fork of Redis 7.2 that Amazon ElasticCache and Google Cloud Memorystore are built on.

Redis Cloud vs. Amazon ElastiCache

ElastiCache previously used Redis open source, but it has now diverged onto Valkey, a Redis 7.2 fork. That means no ongoing support or innovation from the Redis team, including access to Redis 8 features.

Redis Cloud offers 99.999% uptime, advanced capabilities like the Redis Query Engine and native vector search, and cross-region Active-Active replication. In contrast, ElastiCache provides 99.99% uptime and lacks full text search and active-active replication for multi-region deployments.

Ecommerce leader Meesho experienced major performance instability during sales peaks before migrating to Redis. With Redis, they now handle traffic surges up to 20× normal load while maintaining sub-millisecond latency.

Redis vs. Google Memorystore

Memorystore is similarly frozen at Redis 7.2 and lacks advanced Redis Cloud features such as Active-Active geo-distribution, auto-tiering, and multi-cloud flexibility.

When Niantic needed high-performance infrastructure for global gameplay, it originally chose Memorystore but migrated to Redis Cloud after experiencing a multitude of issues. “Adding Redis clusters is less expensive than deploying additional Google Cloud servers,” said Da Xing, Staff Software Engineer at Niantic, citing Redis’s superior scalability and cost efficiency.

Using Redis to prevent the thundering herd problem

There are numerous ways to use Redis to prevent the thundering herd problem. To get you started, we’re providing a few example configurations, some code samples, and ideas for proactive cache refreshing.

Example configurations and code samples

  • Example Configurations:
    • TTL jitter: Stagger expiration to prevent synchronized cache misses.
    • Lua scripting for request deduplication: Mark “in-progress” fetches to stop redundant backend calls.
    • Rate limiting: Use atomic counters or token buckets to throttle client requests.
    • Redis Streams: Buffer bursts of traffic for smoother backend consumption.
  • Code Samples:
    • Basic caching setup
    • Rate limiting with Redis
    • Using Redis Streams to manage request queues: Redis streams buffer requests, allowing the backend to process at a rate that doesn’t overwhelm the system
  • Proactive cache refreshing
    • Prefetching hot keys: Identify frequently accessed keys and refresh them before expiration. Redis’s key-space notifications and monitoring tools make this straightforward.
    • Event-driven invalidation: Use Pub/Sub or Streams to invalidate or refresh cached entries when source data changes. For example, updating a product price could trigger an event that refreshes product:123 in cache immediately, preventing a wave of misses later.

These patterns ensure high cache hit ratios, steady backend load, and stable latency — even under massive concurrency.

Ensure Scalable, High-Performance Systems with Redis

Handling extreme concurrency is a defining challenge of modern architecture. The thundering herd problem can cripple unprepared systems, but with Redis, you can turn concurrency into an advantage.

By anticipating stampedes, implementing intelligent caching patterns, and using Redis as a shield, you can deliver consistent performance even under peak load.

Redis is the foundation for real-time resilience: serving requests from memory, coordinating concurrent workloads, and protecting downstream systems from overload.

Try Redis for free and see how it helps you design systems that stay fast, available, and reliable — no matter how big the herd.