惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google Online Security Blog
Google Online Security Blog
博客园_首页
酷 壳 – CoolShell
酷 壳 – CoolShell
Jina AI
Jina AI
博客园 - Franky
大猫的无限游戏
大猫的无限游戏
Hugging Face - Blog
Hugging Face - Blog
博客园 - 司徒正美
V
V2EX
雷峰网
雷峰网
云风的 BLOG
云风的 BLOG
V
Visual Studio Blog
F
Full Disclosure
Y
Y Combinator Blog
V
V2EX - 技术
Attack and Defense Labs
Attack and Defense Labs
S
Security @ Cisco Blogs
Schneier on Security
Schneier on Security
Microsoft Azure Blog
Microsoft Azure Blog
SecWiki News
SecWiki News
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
The GitHub Blog
The GitHub Blog
量子位
PCI Perspectives
PCI Perspectives
S
Secure Thoughts
D
Darknet – Hacking Tools, Hacker News & Cyber Security
AWS News Blog
AWS News Blog
Blog — PlanetScale
Blog — PlanetScale
爱范儿
爱范儿
K
Kaspersky official blog
B
Blog
A
Arctic Wolf
Hacker News: Ask HN
Hacker News: Ask HN
L
LangChain Blog
T
Tor Project blog
P
Privacy & Cybersecurity Law Blog
Recent Announcements
Recent Announcements
宝玉的分享
宝玉的分享
The Register - Security
The Register - Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
Lohrmann on Cybersecurity
D
Docker
A
About on SuperTechFans
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Google DeepMind News
Google DeepMind News
The Last Watchdog
The Last Watchdog
S
Security Affairs
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Privacy International News Feed
Simon Willison's Weblog
Simon Willison's Weblog

SiliconANGLE

Will agentic AI governance run amok? The lesson of Asimov’s Three Laws - SiliconANGLE AI + quantum, Amazon vs. Starlink and the wide-open US-China internet battle - SiliconANGLE Team Cymru launches Total Insights Feed to replace legacy threat intelligence lists - SiliconANGLE AI Mode in Chrome adds split-screen view to enhance the web search experience - SiliconANGLE Resolve AI raises $40M at $1.5B valuation to optimize production environments - SiliconANGLE How Zscaler and OpenAI turn zero-trust security into an AI accelerator - SiliconANGLE OpenAI ratchets up Codex's agentic capabilities to rival Claude Code - SiliconANGLE Anthropic launches Claude Opus 4.7 with coding, visual reasoning improvements - SiliconANGLE Slash raises $100M at a $1.4B valuation to expand AI-powered banking platform for online businesses - SiliconANGLE Canva unveils Canva AI 2.0, recasting its platform as an agentic system for work - SiliconANGLE Data center, consumer device chips boost TSMC’s revenue - SiliconANGLE Mission-critical security cannot be bolted on, says Oracle - SiliconANGLE Agentic infrastructure reshapes enterprise AI - SiliconANGLE Data quality, and data freedom, foundational for AI success - SiliconANGLE Data trust is a bedrock in successful, scalable AI outcomes - SiliconANGLE Google introduces new agentic AI-ready tools and resources for Android developers  - SiliconANGLE Agentic AI orchestration separates winners from laggards - SiliconANGLE Data-driven tools turning the tide against human trafficking - SiliconANGLE Achieving trusted AI development goes beyond 'vibes' - SiliconANGLE Impinj boosts edge computing power in updated R700 RAIN RFID reader - SiliconANGLE Certinia powers professional services with AI - SiliconANGLE Antioch prepares to accelerate simulated testing for autonomous robots after raising $8.5M - SiliconANGLE Developer tooling startup Expo nabs $45M investment - SiliconANGLE Solidroad lands $25M to bring AI to customer support interactions - SiliconANGLE DuploCloud lands compliance and AI governance certifications as enterprise buyers tighten scrutiny - SiliconANGLE Lua lands $5.8M to help businesses build and manage AI agent workforces - SiliconANGLE Best of frenemies: Oracle's and AWS' clouds unite with dedicated, private connectivity - SiliconANGLE NIST shifts National Vulnerability Database to risk-based triage as CVE submissions hit record levels - SiliconANGLE Cisco goes to the races with new Churchill Downs multiyear partnership - SiliconANGLE Susecon 2026 will tackle the future of open-source platforms - SiliconANGLE Seriously? Footwear brand Allbirds says it has just transformed into an AI business - SiliconANGLE Hilbert nabs $28M to ease analytics projects for consumer-focused companies Qlik debuts new agentic capabilities, aiming to enhance AI trust and transparency - SiliconANGLE Google's Gemini 3.1 Flash TTS model offers unparalleled control over AI voices - SiliconANGLE Parasail raises $32M for its pay-per-token inference cloud - SiliconANGLE Distributed multicloud architectures reshape data - SiliconANGLE Scaling the AI factory through conversational analytics - SiliconANGLE AI-driven decision-making reshapes analytics - SiliconANGLE Artemis reels in $70M to make breach remediation more efficient with AI - SiliconANGLE What to expect during Google Cloud Next: Join theCUBE April 22-24 Trusted data foundation is a gating factor for enterprise AI - SiliconANGLE Redefining database infrastructure with Oracle AI database - SiliconANGLE Oracle makes database key for agentic AI development - SiliconANGLE Oracle bets on AI database convergence for agentic AI - SiliconANGLE Quantum technologies drive EU strategy for hybrid computing - SiliconANGLE How the Leibniz Supercomputing Centre is turning quantum promise into practice Quantum computing meets HPC in hybrid models - SiliconANGLE Quantum-HPC integration enters 'software moment' - SiliconANGLE DeepMind launches Gemini Robotics-ER 1.6 to meet precise physical AI demands  - SiliconANGLE GrowthLoop targets real-time, causal decisioning with AI-infused marketing platform - SiliconANGLE Stendr snags $5.4M in pre-seed funding to develop AI-native drone-tracking tech - SiliconANGLE Salesforce bets on conversation as the new interface for developers - SiliconANGLE Emergent launches Wingman: a personal AI agent for everyone  - SiliconANGLE Axonius targets remediation gap with AI, cyber-physical assets and data trust layer Capsule Security launches with $7M to secure AI agents at runtime - SiliconANGLE Leapwork hands off code validation to AI agents to keep pace with automated software development - SiliconANGLE SolarWinds accelerates observability with SW1, an 'agentic AI teammate' that automates IT firefighting - SiliconANGLE AI satellite constellation startup Orbital gets funded by a16z to verify space-based data center concept - SiliconANGLE Helical raises $10M to bridge the gap between foundation models and drug discovery decisions - SiliconANGLE Sectigo launches Private PQC to enable post-quantum certificate testing in existing workflows - SiliconANGLE German startup Synera lands $40M to automate engineering workflows with AI agents - SiliconANGLE Leadership shifts redefine enterprise AI - SiliconANGLE OpenAI partners with Novo Nordisk to accelerate drug discovery and delivery - SiliconANGLE Amazon debuts high-speed satellite internet antenna for commercial aircraft - SiliconANGLE Japanese tech giants launch joint venture targeting physical AI for robots and machines - SiliconANGLE Electric pickup truck startup Slate Auto raises $650M in funding - SiliconANGLE Zoom Perspectives: Why 'agentic' work is the new enterprise standard - SiliconANGLE China has erased the US lead in AI, Stanford HAI's 2026 AI index reveals - SiliconANGLE Cloudflare expands Agent Cloud with new tools to build and scale AI agents - SiliconANGLE Commvault rolls out AI capabilities to secure agentic workflows and data - SiliconANGLE Digital employees are here: What now? - SiliconANGLE Report: Cisco could acquire AI agent security startup Astrix Security for $250M+ - SiliconANGLE CoreWeave inks multiyear cloud deal with Anthropic - SiliconANGLE Agentic AI will force a rethink at the network edge - SiliconANGLE AI training data startup AfterQuery nabs $30M investment - SiliconANGLE Quantum computing market picks up steam - SiliconANGLE Healthcare IT under siege: CloudWave is fighting back - SiliconANGLE Cloud rebalancing gives service providers a new edge - SiliconANGLE Anthropic tries to keep its new AI model away from cyberattackers as enterprises look to tame AI chaos - SiliconANGLE Nutanix expands agentic AI infrastructure for neoclouds - SiliconANGLE Meta says it will spend an additional $21B on CoreWeave's AI infrastructure - SiliconANGLE Florida AG opens probe into ChatGPT alleging connection to FSU shooting - SiliconANGLE Cisco buys Galileo to strengthen Splunk's agentic monitoring capabilities - SiliconANGLE RISC-V chip design startup SiFive nabs $400M investment - SiliconANGLE Anthropic and OpenAI target big businesses with enterprise-grade controls and lower pricing - SiliconANGLE Intel inks multiyear data center chip partnership with Google - SiliconANGLE Apiiro launches command-line interface to bring AI-native security into software development workflows - SiliconANGLE Yobi teams with Microsoft to deliver predictive consumer intelligence on Azure - SiliconANGLE Amazon CEO Andy Jassy highlights AI growth in annual shareholder letter - SiliconANGLE Is a backlash brewing? Rapid innovation in AI coding and agents may force push for enterprise order and control - SiliconANGLE AI-driven guest experience reshapes hospitality IT strategy - SiliconANGLE Tether launches open-source on-device AI framework for developers - SiliconANGLE Database lifecycle management top priority in enterprise AI - SiliconANGLE AWS previews a cloud-agnostic registry for managing agentic fleets at scale - SiliconANGLE Nutanix bets on agentic AI governance - SiliconANGLE AI infrastructure modernization drives storage rethink - SiliconANGLE Haast raises $12M to help legal teams make haste with compliant AI-generated content - SiliconANGLE Blaize launches AI Services platform to move enterprise AI from pilot to production - SiliconANGLE Wasabi to acquire Seagate's Lyve Cloud business - SiliconANGLE Refiant raises $5M to refine AI models with 'nature-inspired' energy efficiency - SiliconANGLE
QumulusAI and the shift from GPU scarcity to GPU efficiency - SiliconANGLE
GUEST COLUMN by Zeus Kerravala · 2026-06-12 · via SiliconANGLE

QumulusAI and the shift from GPU scarcity to GPU efficiency

Neocloud provider QumulusAI announced today that it has secured more than $124 million in customer subscriptions for three-year terms with Hyperbolic and another leading artificial intelligence inference platform.

These agreements cover deployments totaling 1,280 Nvidia Corp. Blackwell GPUs, delivered via 160 Lenovo and Supermicro bare-metal servers connected with Cisco Systems Inc. Nexus networking to form high-throughput, low-latency clusters.

A notable share of the value is front-loaded, with nearly $21.9 million in combined upfront customer commitments, providing QumulusAI with working capital. Structurally, these are graphics processing unit as-a-service subscriptions rather than one-off hardware deals, which means predictable recurring revenue for QumulusAI and predictable operating expenses for its customers over the life of the contracts. In market terms, this is a significant win for a vertically integrated AI cloud infrastructure provider that is betting on an inference-centric architecture rather than general-purpose “AI cloud” branding.

QumulusAI has been working to reset the floor on AI infrastructure costs by making GPU-class inference more economical and broadly accessible. The best way to understand that shift is to see how it is redesigning infrastructure around utilization and economics rather than peak-performance benchmarks.

How AI infrastructure providers are cutting inference costs by 20%

Traditional AI stacks are often built on generic reference architectures that assume maxed-out central processing units, large memory footprints and oversized local storage “just in case” workloads need them. For inference, that often means enterprises pay for underutilized resources simply because the blueprint was drawn that way.

QumulusAI is challenging that model with an “inference-first” approach. It tunes CPU core counts, system memory and local storage to match the real behavior of large-scale open-source inference workloads, deep-research agents, automated coding systems and other asynchronous applications that prioritize throughput, latency and cost per token. The company’s deployments around Nvidia Blackwell GPUs are designed so that every component above the GPU is rightsized. Its own analysis indicates this can cut AI inference costs by roughly 20% compared with standard configurations, largely by eliminating waste in CPU and storage provisioning.

From GPU scarcity to GPU efficiency

The first wave of generative AI was defined by GPU scarcity. Whoever secured the most accelerators won. That scarcity mindset led AI providers and large enterprises to hoard GPU capacity and overbuild general-purpose infrastructure, assuming training would be the dominant workload. As the market matures, the constraint is shifting from “can I get GPUs?” to “can I afford to run them continuously?” That’s where efficiency becomes the differentiator.

QumulusAI’s architecture pairs Blackwell GPUs with Lenovo and Supermicro bare-metal systems and Cisco Nexus networking. The real innovation is how tightly it aligns those systems with inference utilization patterns. The net effect is that the same GPU remains in play, but the surrounding infrastructure is no longer a generic, overprovisioned shell — it is an efficient, purpose-built environment designed to maximize useful work per watt and per dollar.

Inference is creating a new class of AI infrastructure

Inference is emerging as a distinct class of AI infrastructure, separate from training, with different design goals and success metrics. Training environments are optimized for short, intense bursts and massive data movement. Inference environments, especially for open-source models, are optimized for sustained, high-volume request traffic, predictable latency and stable economics over multiyear horizons.

QumulusAI’s design choices reflect that reality. It leads with GPU-as-a-service contracts, multiyear subscription terms and a distributed deployment model that brings compute closer to end users rather than concentrating everything in a handful of mega-regions. That combination creates an “inference fabric” where capacity can be added incrementally, and the balance of GPUs, CPUs, memory and storage is tuned to maximize utilization rather than headline TOPS. The result is a new category of infrastructure where success is measured by cost per query and utilization rates, not just peak training performance.

How infrastructure teams can reduce AI operating costs

For operations teams, it’s time to rethink how you approach infrastructure. Treat inference infrastructure as a distinct tier, not an extension of existing training clusters or general-purpose virtualized environments.

Start by profiling actual inference workloads. Collect data on request patterns, concurrency, latency targets and model footprints, and use it to right-size CPU, memory and storage around the GPUs you already plan to deploy. Look for providers and partners that offer inference-specific SKUs or architectures, rather than generic “AI-ready” instances that simply bundle more of everything.

Consider distributed or regional deployments where bringing compute closer to users reduces network overhead and improves utilization, especially for asynchronous or agentic workloads that can be scheduled across multiple sites. Finally, shift the financial conversation from “How many GPUs did we buy?” to “What is our cost per 1,000 inferences, and how can we drive it down by 10% to 20% through better utilization?”

Customers such as Hyperbolic are buying optimized capacity, not just GPUs

One proof point of this shift is how customers are structuring their commitments. Companies such as Hyperbolic, which operate large-scale inference services for open-source models, are signing multiyear agreements not simply to lock in GPU inventory but to secure optimized capacity. GPU clusters, CPU and memory configurations, and network fabrics are co-designed for their specific workloads.

In QumulusAI’s case, that has translated into more than $124 million in three-year agreements and substantial upfront commitments. The value proposition is framed around economics — about a 20% reduction in inference costs relative to standard builds — rather than raw accelerator counts. These customers are voting with their budgets for infrastructure that treats inference as a primary workload.

Final thoughts

What’s interesting about this announcement is not just the size of the agreements but the logic behind it. AI infrastructure is entering a second phase where differentiation comes from utilization and economics, not just raw accelerator counts. The pivot from the number of GPUs purchased to efficiency is overdue, and QumulusAI is positioning itself in that gap by wrapping rightsized CPUs, memory,and storage around Blackwell GPUs.

For enterprises, the takeaway is that AI infrastructure is no longer a monolithic, once-in-a-decade investment. It’s becoming a modular, workload-specific fabric where the winners will be the teams and providers that treat inference economics as a design constraint rather than an afterthought.

Zeus Kerravala is a principal analyst at ZK Research, a division of Kerravala Consulting. He wrote this article for SiliconANGLE.

Image: QumulusAI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.