惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

I
Intezer
Jina AI
Jina AI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
有赞技术团队
有赞技术团队
J
Java Code Geeks
人人都是产品经理
人人都是产品经理
博客园 - 叶小钗
M
MIT News - Artificial intelligence
月光博客
月光博客
C
Check Point Blog
Y
Y Combinator Blog
S
SegmentFault 最新的问题
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
C
Cybersecurity and Infrastructure Security Agency CISA
A
Arctic Wolf
S
Security Archives - TechRepublic
S
Securelist
美团技术团队
SecWiki News
SecWiki News
H
Help Net Security
V
Vulnerabilities – Threatpost
S
Secure Thoughts
F
Fortinet All Blogs
量子位
aimingoo的专栏
aimingoo的专栏
T
Tor Project blog
大猫的无限游戏
大猫的无限游戏
Scott Helme
Scott Helme
MyScale Blog
MyScale Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Docker
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
L
Lohrmann on Cybersecurity
F
Fox-IT International blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
博客园 - 三生石上(FineUI控件)
Engineering at Meta
Engineering at Meta
Microsoft Security Blog
Microsoft Security Blog
Recorded Future
Recorded Future
V
Visual Studio Blog
WordPress大学
WordPress大学
S
Schneier on Security
Stack Overflow Blog
Stack Overflow Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Apple Machine Learning Research
Apple Machine Learning Research
N
News | PayPal Newsroom
GbyAI
GbyAI
T
Threat Research - Cisco Blogs

informationweek

AI and connected systems are forcing CIOs and COOs to rethink OT security CIOs need control before AI gains accountability How Anthropic is reordering SaaS — and where CIOs go next Gen Z is booing AI: Why it's a workforce problem for CIOs InformationWeek Podcast: CTOs on testing data that's 'too good' Is your network infrastructure ready for AI workloads? Quantum computing faces security, skills shortage problem Paramount's CIO maps AI scalability; CTO preps for planned exit How Sedgwick scaled AI in legacy claims workflows InformationWeek Podcast: CTOs on using AI in regulated spaces How top CIOs are measuring the real ROI of IT automation What AI must learn from Roosevelt, conservation and 1929 Experian's chief innovation officer gleans AI gains with startup collab ETS CIO on competing with AI startups 'running with scissors' Before the next VMware: How CIOs prepare for vendor shocks The strategic alignment powering cyber-resilient organizations The AI infrastructure bottleneck is becoming a CIO problem InformationWeek Podcast: CTOs on reining in rogue AI agents Workplace equity in the age of AI Why and how to implement an AI asset rationalization strategy Why companies are shifting toward private AI models AI agents in automation: When to build, when to buy Navan CTO's bullish AI take: 'Do not use LLMs; use agentic systems' AI on trial: The Workday case that CIOs can't ignore The AI infrastructure boom is coming for enterprise budgets How enterprises can manage LLM costs: A practical guide What CIOs miss when buying vertical SaaS software InformationWeek Podcast: How CTOs balance AI and their teams Whirlpool, Duke Energy and Cleveland Clinic CIOs slow down to scale AI Where CIOs get stuck rebuilding the enterprise: What 'Rewired' reveals As AI makes projects harder to track, will CIOs need new controls? Why disaster recovery plans fail in geopolitical crises A silent erosion of enterprise AI by data poisoning Priceline CTO prioritizes engineers able to 'hold a room and a roadmap' InformationWeek Podcast: When CTOs need to restart IT projects Wayfair CTO maps agentic path across digital and brick-and-mortar commerce The AI contract gaps the Google-Pentagon deal just made visible Non-human identity sprawl is agentic AI's real risk Anthropic's Mythos forces a rethink of vulnerability management Outsourcing contracts weren't built for AI. CIOs are renegotiating now The AI spend hangover companies didn't plan for The power of CIO networking in the competitive AI world Salesforce is disrupting itself -- CIOs can't afford to look away Salesforce is disrupting itself -- CIOs can't afford to look away Why CIOs see AI projects stall: Speed without structure kills scale IT leaders should never let a good crisis go to waste SFO's digital twin maps airport operations from the curb to takeoff CIOs caught in the middle as AI startups disrupt vertical Saas How to submit an IT leadership column to InformationWeek Podcast: Rightsizing AI frameworks to avoid failure modes The invisible labor crisis inside IT: AI work the org chart can't see Why AI teams treat training data like capital Ask the Experts: How CIOs can identify and overcome cultural barriers to innovation Nobody told legal about your RAG pipeline -- why that's a problem Meta's new 'AI Zuckerberg' is a mirror for every C-suite Rethink tech talent: Local is the smartest play for IT Will the music stop for AI's funding dance? InformationWeek Podcast: Catching hidden errors in AI-powered code CIOs can combat talent scarcity with AI-augmented leadership -- Gartner How Bellevue, Wash., is applying AI to streamline a broken permitting process Ignore the hype: Smarter tech bets at speed of change Who controls the fix? Colorado's repair fight tests CIO power Ask the Experts: The red flags that signal an AI project isn't worth pursuing 2026 tech company layoffs The hidden high cost of training AI on AI Red Hat's Marco Bill: Resource control is key for AI sovereignty InformationWeek Podcast: New IT architecture, cloud, edge and AI Enterprises need Tier 1 provider relationships to deliver on AI Shutterstock CTO's playbook for scaling AI without vendor sprawl Shutterstock CTO's playbook for scaling AI without vendor sprawl How CIOs run and rebuild the business at the same time in the AI era It's not your tech stack, it's your structure -- fix it Confidential computing resurfaces as security priority for CIOs FinOps: Helpful tool, or a cloud control placebo for CIOs? Cleveland's open data overhaul: From sticky notes to public dashboards As Microsoft expands Copilot, CIOs face a new AI security gap Why build vs. buy doesn't fit modern IT systems InformationWeek Podcast: Is quantum computing slumbering? Your AI vendor is now a single point of failure Vibe coding: Speed without security is a liability A practical guide to controlling AI agent costs before they spiral AI fuels a new wave of technical debt The sunsetting of Sora: A hard lesson in AI portfolio resilience HP pushes broad internal AI use after early productivity gains Why value-based pricing is inevitable InformationWeek Podcast: Safeguarding ecosystems from outsiders Why AI scaling is so hard -- and what CIOs say works Humans are the North Star for AI-native workplaces -- Gartner How IT leaders build a culture for what comes next Compliance costs risk widening the AI gap AI-driven layoffs add new demands on CIOs to prove value AI transformation: Early wins are not enough for CIOs Why CIOs can't let users wait on IT Memory shortage doesn't have to spell disaster for IT budgets Accelerate AI adoption: 3 reasons for adopting MCP How techno-nationalism is complicating IT resilience and supply chains for CIOs InformationWeek Podcast: Compliance crackdown on AI and BYOD Workday’s AI reset: Agents and the race to remake SaaS Why enterprise AI initiatives keep dying before production Metrics of meaning: What do we really measure in AI?
Control plane failures increasingly at center of cloud outages
Warren Bayek · 2026-05-28 · via informationweek

A digital illustration of connections forming a cloud.

Wavebreakmedia Ltd IFE-240525_3/Alamy

Cloud outages drove headlines in 2025 with disruptions across major providers and hundreds of millions in estimated losses. But the havoc wasn't caused for only the reasons many enterprise and industrial IT leaders expected. In several high-profile incidents, the underlying infrastructure remained fully functional. 

Power systems were stable. Compute and storage capacity was available. Networks were up. Yet critical services still went down.

Across multiple industry analyses, a pattern has emerged: Failures increasingly originate not in the data plane — where workloads run — but in the control and management layers that coordinate, authenticate, configure and orchestrate systems at scale. 

According to Uptime Institute's 7th Annual Outage Analysis, IT and networking outages increased in 2024, accounting for 23% of impactful outages, reflecting increased IT and network complexity that led to issues with change management and misconfigurations. This represents a fundamental shift in the outage landscape, one that hardware redundancy cannot address: Infrastructure didn't fail, control did.

Related:FinOps: Helpful tool, or a cloud control placebo for CIOs?

Industry analysts are drawing the same conclusion. The 2024 Gartner report "9 Principles for Improving Cloud Resilience" noted that control plane failures can prevent operators from executing remedial actions even when data-plane traffic is still flowing, blocking provisioning, configuration changes and automated recovery actions at the very moment they are needed most. In these scenarios, resilience depends less on redundant infrastructure and more on prebuilt contingency plans and tested operational procedures.

The fragility of centralized control

Modern cloud and distributed environments depend on control planes. These are centralized or semi-centralized systems that handle orchestration, policy enforcement, identity, routing and lifecycle management. These layers act as the operational "brain" of digital infrastructure.

Over time, these control systems have become more automated, more feature-rich and more centralized. That improves efficiency, but it also increases risk. When a control plane misconfigures resources or becomes unavailable, the impact can extend across regions, sites and services simultaneously.

For years, resilience strategy focused on redundancy: duplicate servers, replicated storage and distributed clusters. These measures protect execution capacity. However, they do not guarantee operational continuity when orchestration and management layers fail.

When control systems are impaired, organizations may encounter the following:

Related:Ask the Experts: CIOs say they wouldn’t pull workloads back from the cloud

  • Applications may continue running, but they cannot be reached.

  • Systems remain healthy, but they cannot be reconfigured.

  • Identity and access services are online but unusable.

  • Automation pipelines propagate errors faster than teams can respond

For industrial and enterprise operators, this creates a dangerous illusion of availability without operability. It's comparable to a production facility with fully functional machinery but no control system to coordinate operations.

Complexity, automation increase risks

The stakes will only go higher as environments become increasingly software-defined, more complex and more automated, while still being highly dependent on humans to avoid mistakes. Outage analyses across the industry continue to show that process breakdowns and human error remain major contributors, especially during change events. It's no wonder; operational teams now manage hybrid estates spanning cloud, edge, on-premises and third-party platforms, which are often connected through layered automation and policy engines. Each added integration point increases coupling and reduces transparency. At the same time, enterprises are pushing faster release cycles, more infrastructure as code and broader automation — all positive trends, but ones that require stronger guardrails and validation.

Related:Ask the Experts: The cloud cost reckoning

The result is a risk multiplier: higher system complexity, combined with faster change velocity and centralized control authority.

Industrial, mission-critical systems face high stakes

For industrial and enterprise operators, outages are not just digital events; they are operational events. Downtime can halt production lines, interrupt field operations, delay logistics, disrupt communications or affect safety systems.

These environments cannot rely solely on remote or centralized recovery. They require architectures that can sustain safe, predictable operation even when upstream control systems are degraded.

That requires designing for operational independence, not just availability.

Key architectural priorities increasingly include:

  • Distributed control with site-level autonomy.

  • Local survivability during WAN or cloud control loss.

  • Fault domains that limit orchestration blast radius.

  • Deterministic behavior under degraded connectivity.

  • Change validation and staged rollout controls.

  • Operational guardrails that constrain automation risk.

From uptime to operational continuity

Traditional resilience metrics emphasize uptime, focusing on whether infrastructure is reachable and powered. But for industrial and enterprise systems, the more meaningful measure is operational continuity: Ensuring systems remain controllable, observable and safe under stress.

A system that is technically "up" but cannot be managed, authenticated or reconfigured is not operationally available.

As enterprises expand edge deployments, adopt AI-driven workloads, and increase automation across infrastructure, the control plane becomes a primary risk surface. 

Resilience strategies must evolve, extending beyond redundant hardware and multi-region failover to include distributed control design, process discipline and failure-containment architecture. This is a new architectural mindset, one that extends resilience to all the pieces that collectively determine how a cloud operates under pressure. 

In an era defined by digital dependence, the real measure of cloud resilience is the ability to continue functioning when the unexpected happens. The lesson from outage trends is clear: Resilience is no longer defined by only what keeps running, but by what remains in control.

About the Author

Warren Bayek

Wind River

Warren Bayek leads the CTO Office for Wind River's cloud strategy, including 5G cloud and virtualization, working with telco providers to accelerate virtual and Open RAN adoption. He has 30 years of experience designing and delivering complex, high-availability software in the telecom sector across large enterprises and startups.