
























Google has unveiled its eighth-generation Tensor Processing Units, introducing two custom AI chips designed separately for model training and inference as demand for large-scale AI computing surges.
Announced at Google Cloud Next, the new processors are called TPU 8t and TPU 8i. They are built to power Google’s AI Hypercomputer platform and support workloads ranging from training frontier models to serving AI agents in production.
TPUs are Google’s in-house accelerators that have powered internal systems such as Gemini for years. The company is now expanding that hardware to customers looking for alternatives to Nvidia-dominated AI infrastructure.
Google said both chips will become generally available later this year.
The TPU 8t is optimized for training large AI models. Google said a single superpod can scale to 9,600 chips and deliver 121 exaflops of compute performance.
The company added that TPU 8t offers nearly three times the compute performance per pod compared with the previous generation, Ironwood.
Training systems also received faster storage access and upgraded networking aimed at keeping chips busy instead of waiting for data.
Google said TPU 8t targets more than 97 percent “goodput,” a term used to measure productive compute time instead of idle time caused by failures or bottlenecks.
That matters because delays across massive clusters can add days to training schedules for advanced AI systems.
The TPU 8i focuses on inference, the stage where trained AI models answer prompts, run tools, and power software agents.
Google said TPU 8i includes 288 GB of high-bandwidth memory and 384 MB of on-chip SRAM, helping keep active model data closer to the processor for faster responses.
The chip also uses Google’s Axion Arm-based CPUs and upgraded interconnect bandwidth for Mixture of Experts, or MoE, models. These architectures activate only parts of a model at a time to lower costs while scaling performance.
According to Google, TPU 8i delivers 80% better performance-per-dollar than the prior generation, allowing customers to handle nearly twice the workload at the same cost.
The launch highlights how AI infrastructure is shifting beyond general-purpose GPUs toward specialized chips tuned for different workloads.
Google said the two-chip strategy was shaped by the rise of AI agents, which need systems that can reason through tasks, run workflows, and repeatedly interact with tools and other models.
In data centers, Google said both chips also offer up to two times better performance-per-watt than Ironwood.
They use fourth-generation liquid cooling to support higher compute density while controlling power use.
The announcement also underscores Google’s broader effort to challenge Nvidia’s grip on AI hardware by combining custom silicon, networking, software frameworks, and cloud services into one stack.
Both TPU 8t and TPU 8i will be available through Google Cloud later this year.
Google said the chips also support frameworks including JAX, PyTorch, SGLang, and vLLM, allowing developers to run existing AI workloads without major software rewrites or migration hurdles.
Get the latest in engineering, tech, space & science - delivered daily to your inbox.
With over a decade-long career in journalism, Neetika Walter has worked with The Economic Times, ANI, and Hindustan Times, covering politics, business, technology, and the clean energy sector. Passionate about contemporary culture, books, poetry, and storytelling, she brings depth and insight to her writing. When she isn’t chasing stories, she’s likely lost in a book or enjoying the company of her dogs.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。