惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

Catchpoint Blog

SRE Report: AI optimism and the economics of effort SRE Report: Why fast is what users trust The SRE Report 2026: Defensible Ns SRE Report 2026: What surprised us, what didn't, and why the gaps matter most Why Synthetic Tracing Delivers Better Data, Not Just More Data A New Chapter: LogicMonitor + Catchpoint – A Personal Note from Mehdi Mezmo + Catchpoint deliver observability SREs can rely on The four pillars holding up your digital business, and what happens when they crumble When payments pause: lessons from a global payments outage Observability 2025 Decoded: What the DZone Report Means for SLO-Driven Ops The next evolution of WebPageTest has arrived, and it’s a game-changer The Monitoring Blind Spot That Could Cost You Black Friday Powering Mexico’s Digital Future: Expanded Internet Observability with Catchpoint The Next Chapter of WebPageTest: Your New Experience Starts Soon SRE Report Retrospectives — Have AIOps Predictions Held Up? When BGP becomes UX: The inside story of a SaaS routing decision gone wrong (or right) Session Replay explained: A guide to seeing digital experience through your user’s eyes Making the invisible visible: Are your cloud firewalls and DDoS protection really working? Why it’s time to move beyond APM: Monitoring from the user’s perspective When metrics mislead: Inside the 2025 Retail Web Performance Benchmark The vendor trap: why your next outage won’t be your fault—but will be your problem LLMs don’t stand still: How to monitor and trust the models powering your AI Semantic Caching: What We Measured, Why It Matters The Annual SRE Survey Is Open—We Want to Hear from You Observability isn’t about the tool. It’s about the truth Invisible dependencies, visible impact: Lessons from the Google Cloud outage Real-time detection of BGP blackholing and prefix hijacks Leading analyst firm reveals the real cost of internet disruptions The Power of Over 3000 Intelligent Observability Agents Monitoring in the Age of Complexity: 5 Assumptions CIOs Need to Rethink Why Intelligent Traffic Steering is Critical for Performance and Cost Optimization Retail digital performance event recap: Key insights from IBM & Catchpoint Zendesk outage: A case for proactive monitoring and faster incident response Silence during chaos: Why the X outage is a call to arms for proactive monitoring The $1 Million Lesson: Building a Culture of Quality Through SLAs When AI tools fail: How to map your AI dependencies for proactive visibility Why Super Bowl 2025 was a triumph for Internet Resilience Why Internet Performance Monitoring is the new health check for IT organizations Why use Playwright in Catchpoint for synthetic monitoring Introducing WebPageTest Expert Plan: Real-Time Insights, Synthetic + RUM together in One Platform The shift to digital: How businesses are reshaping their priorities for 2025 Monitoring in the Age of the Internet: DEM, IPM, and APM—What You Need to Know SSL Monitoring, Trust, and McLOVIN Performing for the holidays: Look beyond uptime for season sales success Cloudflare outage: another wake-up call for resilience planning 5 Actions you can take to improve digital performance 2024: A banner year for Internet Resilience APM vs Observability: Both-and, not either-or APM vs observability: why your definitions are broken APM vs Observability: What comes next? APM vs Observability: Observing beyond APM AWS Outage: How do you prepare for the failure of your own safety net? Agentic AI: Powerful But Fragile—What You Need to Know Catch frustration before it costs you: New tools for a better user experience Catchpoint Peak Performance Summit 2025: Redefining Observability for the Outcome Economy Connected Devices: Unlocking the next frontier of Internet Performance Monitoring Cloud Monitoring's Blind Spot: The User Perspective Cloudflare’s Resolver Outage: More Than Just DNS How to Monitor AI Agents in Commerce Systems Creating the IPM Category: Catchpoint’s Journey to Leadership and the LogicMonitor Era Critical Requirements for Modern API Monitoring Diagnosing Wi-Fi failures that traditional tools miss: a case study Escalating risk, shrinking margins: The 2025 Internet Resilience Report Fast and furious: The importance of performance in the digital age Getting Started with Traceroute From the source to the edge: the six agent types you can’t ignore From SEO to AEO: Why Web Performance Is the Key to AI Search Success Here’s the proof: What the fastest sites on the web have in common Google’s Agent-to-Agent (A2A) Protocol is here—Now Let’s Make it Observable How IPM helped a top tech brand catch an OpenAI outage before it became a crisis How AI Turns Monitoring From “What Now?” Into “What’s Next?” How SAP achieved world-class uptime through modern observability
The SRE Report 2025's Call to Action
2025-01-13 · via Catchpoint Blog

The SRE Report is now seven years old. I’ve had the honor and privilege of authoring it for the last five years. This 2025 version included working with some amazing individuals like Kurt Andersen and Denton Chikura. My heartfelt thanks go to them for shouldering the weight of what is both a labor of love and an often daunting, procrastination-inducing marathon of analysis.

I’d also like to thank all the contributors in our View from the field sections of the report: Martin Barry, Laura de Vesine, Dave O’Connor, Heinrich Hartmann, Robert Barron, and Sergey Katsev. Their perspective grounds the data in real-life contexts, bridging the gap between survey data and the day-to-day realities of Site Reliability Engineering.  

Check out this short intro video from myself, Kurt and Sergey,where we share what the report is all about and what you can expect.

Of course, the true heroes of the SRE Report are the SREs and reliability practitioners who take the time to respond to our survey. The report lives or dies by the truth of their responses. This year’s edition is no exception, reflecting input from professionals around the world, spanning a diverse range of company sizes, roles, and managerial responsibilities. Thank you for helping make the SRE Report the longest-running, most authentic, and accurate pulse of the reliability engineering community.

An honorable mention must go to the DORA Report. The SRE Report doesn't exist in a vacuum. It draws on trusted industry research, including insights from the 2024 DORA Report, to present a holistic view of key challenges, such as AI’s impact on toil (more on that later).

My most surprising discoveries from The SRE Report 2025

When we release the report, we are always asked “What was most surprising or controversial to you?” In no particular order, here are my most striking takeaways from this year’s edition.

Insight I: It’s official slow is the new down

Each year, we highlight an emerging trend in the field. In past reports, we’ve explored topics like Platform Operations, the concept of Total Experience, and the multi-party dilema. This year, we kicked off the report with something a little softer—investigating whether popular hallway expressions hold true in practice.

They say that if you repeat something loudly and often enough, it won’t be long before it’s widely accepted as true.  

When it comes to the phrase, ‘Slow is the new down,’ however, the majority of organizations are united on its validity. Most organizations (53%) agree with this expression, even though only 21% say they have heard it before.  

The implications for the wider industry are obvious: poor performance is now seen as equally harmful as outright downtime. Reliability is no longer about uptime alone; it’s about delivering consistent, fast experiences.

Insight II: Toil Levels Rise for First Time Ever (So Much for AI)

For seven years, the survey underpinning the SRE Report has relied on consistent methodology, allowing us to track trends over time. Among these, the "time spent" questions act as benchmarks for organizations. So when this year’s data revealed that toil levels had risen for the first time in five years, it was a wake-up call.

What makes this finding even more striking is that time spent on engineering and on-call activities remains steady. The obvious question is: why is toil rising?

Could it be AI?

Many had hoped AI would reduce toil, but reality, as always, is more complicated. The 2024 DORA Report suggests AI accelerates value realization, potentially leading us to fill newfound capacity with additional operational tasks.

Regardless, it’s a red flag. Toil above 50%—Google’s recommended ceiling—restricts organizations’ ability to focus on proactive development. Yet the data shows operational workloads creeping higher. If toil is rising without corresponding gains in system resilience or optimization, are we investing in the right areas? Or are we stuck firefighting, unable to break free to drive long-term value?

Insight VII: Acknowledge the Gap to Fix the Gap

If there’s one takeaway from this year’s SRE Report that challenges us all, it’s the stark gap in how reliability practices are perceived and implemented across organizational ranks. It is, to me, the most provocative takeaway from The SRE Report 2025. More than merely an insight, the need to acknowledge the gap to fix the gap is a call to action for the entire reliability engineering community.

One standout example of the misalignment is the question of testing reliability incident preparedness through chaos engineering exercises like simulated disruptions, failovers, and tabletop scenarios. While only 37% of respondents overall agreed that their teams regularly engage in these practices (the lowest agree score of this section), the breakdown by rank paints an even more striking picture.

The divide between individual contributors and higher management is sharp. Individual contributors and team leads report significantly lower levels of preparedness testing compared to senior management. This divergence suggests that while leadership may believe these practices are happening—or are a priority—the reality on the ground is very different.

And that’s just one example.  

A challenge to the community

The consequences of such a disconnect are far-reaching. The report states, “Without a clear, shared understanding of the current situation, it becomes difficult to set common goals and agree on the steps needed to achieve them. This misalignment can result in wasted resources, duplicated efforts, and missed opportunities.”

But this isn’t just a problem to lament—it’s an opportunity. The SRE Report 2025 is more than a collection of data points—it’s a reflection of where we are as a community and a roadmap for where we need to go. The gaps highlighted in the report provide a starting point for organizations to open up a conversation between leadership and practitioners, fostering a shared understanding of what’s really happening and what needs to change.  

It’s a challenge to all of us—individual contributors, managers, and leaders alike—to acknowledge these gaps, have the necessary conversations, and take action. Whether it’s addressing rising toil, rethinking how we define reliability, or addressing the disconnect between ranks, bridging the gap starts here.

Download The SRE Report 2025 or read it online (no registration required).

Summary

The SRE Report is now seven years old. I’ve had the honor and privilege of authoring it for the last five years. This 2025 version included working with some amazing individuals like Kurt Andersen and Denton Chikura. My heartfelt thanks go to them for shouldering the weight of what is both a labor of love and an often daunting, procrastination-inducing marathon of analysis.

I’d also like to thank all the contributors in our View from the field sections of the report: Martin Barry, Laura de Vesine, Dave O’Connor, Heinrich Hartmann, Robert Barron, and Sergey Katsev. Their perspective grounds the data in real-life contexts, bridging the gap between survey data and the day-to-day realities of Site Reliability Engineering.  

Check out this short intro video from myself, Kurt and Sergey,where we share what the report is all about and what you can expect.

Of course, the true heroes of the SRE Report are the SREs and reliability practitioners who take the time to respond to our survey. The report lives or dies by the truth of their responses. This year’s edition is no exception, reflecting input from professionals around the world, spanning a diverse range of company sizes, roles, and managerial responsibilities. Thank you for helping make the SRE Report the longest-running, most authentic, and accurate pulse of the reliability engineering community.

An honorable mention must go to the DORA Report. The SRE Report doesn't exist in a vacuum. It draws on trusted industry research, including insights from the 2024 DORA Report, to present a holistic view of key challenges, such as AI’s impact on toil (more on that later).

My most surprising discoveries from The SRE Report 2025

When we release the report, we are always asked “What was most surprising or controversial to you?” In no particular order, here are my most striking takeaways from this year’s edition.

Insight I: It’s official slow is the new down

Each year, we highlight an emerging trend in the field. In past reports, we’ve explored topics like Platform Operations, the concept of Total Experience, and the multi-party dilema. This year, we kicked off the report with something a little softer—investigating whether popular hallway expressions hold true in practice.

They say that if you repeat something loudly and often enough, it won’t be long before it’s widely accepted as true.  

When it comes to the phrase, ‘Slow is the new down,’ however, the majority of organizations are united on its validity. Most organizations (53%) agree with this expression, even though only 21% say they have heard it before.  

The implications for the wider industry are obvious: poor performance is now seen as equally harmful as outright downtime. Reliability is no longer about uptime alone; it’s about delivering consistent, fast experiences.

Insight II: Toil Levels Rise for First Time Ever (So Much for AI)

For seven years, the survey underpinning the SRE Report has relied on consistent methodology, allowing us to track trends over time. Among these, the "time spent" questions act as benchmarks for organizations. So when this year’s data revealed that toil levels had risen for the first time in five years, it was a wake-up call.

What makes this finding even more striking is that time spent on engineering and on-call activities remains steady. The obvious question is: why is toil rising?

Could it be AI?

Many had hoped AI would reduce toil, but reality, as always, is more complicated. The 2024 DORA Report suggests AI accelerates value realization, potentially leading us to fill newfound capacity with additional operational tasks.

Regardless, it’s a red flag. Toil above 50%—Google’s recommended ceiling—restricts organizations’ ability to focus on proactive development. Yet the data shows operational workloads creeping higher. If toil is rising without corresponding gains in system resilience or optimization, are we investing in the right areas? Or are we stuck firefighting, unable to break free to drive long-term value?

Insight VII: Acknowledge the Gap to Fix the Gap

If there’s one takeaway from this year’s SRE Report that challenges us all, it’s the stark gap in how reliability practices are perceived and implemented across organizational ranks. It is, to me, the most provocative takeaway from The SRE Report 2025. More than merely an insight, the need to acknowledge the gap to fix the gap is a call to action for the entire reliability engineering community.

One standout example of the misalignment is the question of testing reliability incident preparedness through chaos engineering exercises like simulated disruptions, failovers, and tabletop scenarios. While only 37% of respondents overall agreed that their teams regularly engage in these practices (the lowest agree score of this section), the breakdown by rank paints an even more striking picture.

The divide between individual contributors and higher management is sharp. Individual contributors and team leads report significantly lower levels of preparedness testing compared to senior management. This divergence suggests that while leadership may believe these practices are happening—or are a priority—the reality on the ground is very different.

And that’s just one example.  

A challenge to the community

The consequences of such a disconnect are far-reaching. The report states, “Without a clear, shared understanding of the current situation, it becomes difficult to set common goals and agree on the steps needed to achieve them. This misalignment can result in wasted resources, duplicated efforts, and missed opportunities.”

But this isn’t just a problem to lament—it’s an opportunity. The SRE Report 2025 is more than a collection of data points—it’s a reflection of where we are as a community and a roadmap for where we need to go. The gaps highlighted in the report provide a starting point for organizations to open up a conversation between leadership and practitioners, fostering a shared understanding of what’s really happening and what needs to change.  

It’s a challenge to all of us—individual contributors, managers, and leaders alike—to acknowledge these gaps, have the necessary conversations, and take action. Whether it’s addressing rising toil, rethinking how we define reliability, or addressing the disconnect between ranks, bridging the gap starts here.

Download The SRE Report 2025 or read it online (no registration required).

This is some text inside of a div block.