

























I just got back from Google Cloud Next 2026, and the line I keep coming back to in conversations is this: AI Hypercomputer is what a decade of Google’s co-design finally looks like, assembled on one stage. I attended the original public TPU reveal at Google I/O on May 18, 2016. The first TPU had been running quietly inside Google’s datacenters for a year before that. Ten years later, the long arc of development has produced the eighth-generation TPU lineup, the new Virgo Network, the Managed Lustre file system at 10 terabytes per second, the GKE Agent Sandbox, and Axion-powered N4A instances.
There can be no doubt now whether Google can build infrastructure. Google Cloud exited Q4 2025 as the fastest-growing of the Big Three CSPs, with quarterly revenue up 48% year-over-year to $17.66 billion and a backlog of $240 billion. Alphabet has guided 2026 capex of $175 billion to $185 billion, primarily for technical infrastructure. The demand is there across Google’s consumer and commercial franchises. I wrote a year ago, after Next 2025, that Google was finally getting serious about infrastructure as a service. This year confirms it.
My core takeaway: AI Hypercomputer is no longer a marketing wrapper around a TPU launch. The TPU split into TPU 8t and TPU 8i, the new Virgo Network, Managed Lustre, GKE Agent Sandbox, and the now-GA Axion N4A instances represent a genuinely vertically integrated AI stack across compute, storage, networking, and orchestration. The question for enterprise buyers isn’t whether the stack is impressive. It is. The question is how much of it they actually get to use, on what terms, and where the real risk concentrations lie. I wrote in November after the Google Public Sector Summit that Google Cloud CEO Thomas Kurian was already telegraphing exactly this bifurcation, with two AI stacks for two different jobs. The TPU 8t and 8i split is what Google actually delivered.
Digging into compute, TPU 8t is the training chip, with appropriately impressive stats at scale: 9,600 chips per superpod, two petabytes of shared high-bandwidth memory, 121 exaflops, 2.7x performance-per-dollar over Ironwood, native FP4 in the matrix units, and Axion Arm hosts. x86 processors still do some of the heavy lifting for agentic orchestration; I am assuming this is from Intel, given the strategic announcement between the two companies. With Google’s Pathways AI architecture and JAX library, a single logical training cluster now scales past one million TPUs across multiple datacenter sites.
I think TPU 8i for inference is the more interesting bet. Google’s new Boardfly topology was co-designed with DeepMind to optimize for latency, not bandwidth. That’s exactly the right call for agents and inference, where minimum time-to-response, not raw throughput, is the key customer need. MediaTek joined Broadcom as a confirmed silicon design partner for the eighth-gen program, with Marvell reportedly in talks to become a third partner in the future, per The Information. That’s genuine multi-partner co-design, not slideware.
As for storage, Managed Lustre now delivers 10 terabytes per second of bandwidth, which Google claims is a 10x improvement over last year and up to 20x faster than other hyperscalers. Storage was the silent bottleneck in agentic workloads, and Google has finally moved on it.
For networking, Virgo connects 134,000 TPUs in a single fabric in one datacenter, and more than one million TPUs across multiple sites. For NVIDIA Vera Rubin NVL72 on Google Cloud’s A5X bare-metal instance, the same Virgo fabric supports up to 80,000 GPUs in a single datacenter and 960,000 across sites. In other words, customers who want NVIDIA don’t have to leave Google Cloud, which is the right answer for enterprises.
On orchestration, I think GKE Agent Sandbox with Axion-powered N4A is the under-covered piece of the AI Hypercomputer puzzle. Google claims 30% better price-performance for agent workloads compared to other hyperscalers. That’s a metric I want to see hold up under independent assessment, because orchestration is where compute, storage, and networking actually meet the agent. In my conversation with Mark Lohmeyer, Google Cloud’s head of AI and compute infrastructure, the through-line was the same: The stack only matters if it composes cleanly for agents. Also last week, Lohmeyer told the press that, uniquely within Google, the company is co-designing this offering across the full infrastructure stack. That co-design claim is the entire AI Hypercomputer thesis in one sentence.
We also shouldn’t gloss over the importance of “goodput”: not just the raw throughput attained, but the part of it that actually yields good output. Amin Vahdat, Google’s SVP and chief technologist for AI and infrastructure, has been making a point on stage about cluster reliability that almost nobody outside the operators thinks about. At 10,000-chip scale, fail-stop failures and silent data corruption quietly eat training throughput. Google claims that TPU 8t is engineered to target over 97% goodput. After 35 years in this industry, I’ll tell you that peak FLOPS is just a marketing number. Goodput is what determines whether your training cycles get wasted. As my colleague Matt Kimball wrote in his Ironwood analysis last year, Google’s TPU strategy was already tacking toward the age of inference. TPU 8t and 8i embody the architectural commitment to that direction, and Vahdat’s goodput framing explains the operating philosophy behind it.
Now I want to talk about NVIDIA. The trope I keep seeing is that Google’s TPU is “taking on” NVIDIA. I don’t buy it, and I haven’t bought it. Saying Google is taking on NVIDIA’s chips is like saying Apple’s M-series chips are taking on Intel and AMD. Apple competes with Dell, HP, and Lenovo at the system level. Its chips don’t compete with Intel and AMD as merchant silicon. Similarly, there’s nothing standard about Google’s AI Hypercomputer. It’s dialed in after a decade-plus of work, it’s currently very proprietary, and it’s primarily built to serve Google’s own workloads, including Gemini, Search, YouTube, and Android.
Look at the strongest customer validation point Google has for this. The Anthropic deal was expanded earlier this month via a Google and Broadcom agreement to deliver multiple gigawatts of next-generation TPU capacity beginning in 2027. But we’re talking about a single customer with very specific characteristics, and other named TPU customers like Citadel Securities are likewise jumbo-scale consumers of compute. Meanwhile, most enterprises will engage AI Hypercomputer through the Gemini Enterprise managed front door, not through bare-metal TPU access.
So even if you wanted to compare TPU and NVIDIA head-to-head, you can’t yet. Right now, only Google knows for sure how it stacks up against NVIDIA’s chips. And before I weigh in on relative price-performance, I want to see credible third-party assessments across a wide variety of workloads, touching entire stacks, from more than one outlet. To be fair, Google has now exposed native PyTorch support for TPU via TorchTPU in preview, which is a clear signal it wants to reduce framework lock-in friction. Whether that’s enough to peel real workloads off of NVIDIA is an open question. The benchmarking gap is significant, to say the least, and the responsible analyst position is to wait for the answers, not declare a winner prematurely. I am digging into Prism, which Google tells me is a way for customers to compare TPU versus GPU, and the Signal65 team will be doing the same.
Based on what I’ve seen and heard, Google customer outcomes are starting to land. I heard a non-enterprise video enhancement company at the show describe running on Google Cloud and getting 40 to 50% cost savings, with inference times moving from 15 to 20 minutes down to under a minute. That’s the kind of workload-specific outcome that matters more than peak FLOPS claims, and the kind of customer story Google needs more of in public.
If you’re an Anthropic-class buyer with frontier-model workloads, AI Hypercomputer is now a genuine second source. Native PyTorch via TorchTPU removes a historical friction point. The economics will compete with NVIDIA on inference for specific shapes of workload, and the multi-gigawatt commitments from Anthropic suggest that Google’s supply story is real for buyers operating at that scale.
If you are a large enterprise, you will want to feel comfortable that you can jockey around between different AI accelerators — be it TPU, GPU, or some future chip that is still to be defined. Just because the industry has seemingly chopped up the AI workflow only across training, prefill, and decode doesn’t mean there won’t be a fourth or fifth variant.
If you’re a mid-market enterprise, you’re getting most of the value through Gemini Enterprise and the Agent Platform, not through AI Hypercomputer in your tenancy. Both of these assessments can be true, based on the scope and needs of the customer. Don’t confuse one for the other when you’re sizing budgets and capabilities.
A note on the competitive context: AWS went the opposite direction with Trainium3 at re:Invent 2025, converging training and inference into a single SKU. Meanwhile, NVIDIA is scaling up within the rack with Vera Rubin NVL72, prefill with CPX, and decode with LPX. Google, as we’ve seen, is bifurcating across specialized silicon. Three different bets, based on three different theories of where the agentic workload actually lands. None of them is obviously right yet, and it’s possible that more than one of these bets could pay off. Given that each processor does something specialized, for as long as the work to orchestrate is less than the extra work of specialized processors, then it’s a win.
Google Cloud has built what appears on the surface to be the most coherent vertically integrated AI stack outside of NVIDIA, and Thomas Kurian deserves credit for staying with the bet for six-plus years. The TPU split, Virgo, Managed Lustre, GKE Agent Sandbox, and the partnership with NVIDIA on Vera Rubin all reinforce a stack that’s genuinely co-designed rather than bolted together. Assuming it’s accurate, Google’s own claim that nearly 75% of its cloud customers are now using its AI products validates that demand is there.
What I’m watching for next: credible third-party benchmarks across full stacks, real production agent reliability metrics from more enterprise customers, and whether native PyTorch on TPU is enough to pull workloads away from GPUs. Until then, AI Hypercomputer is a real accomplishment for Google. Just don’t oversell what it means for the rest of the market.
Note: Google Cloud is a client of Moor Insights & Strategy.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。