惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LINUX DO - 最新话题
云风的 BLOG
云风的 BLOG
博客园 - 三生石上(FineUI控件)
人人都是产品经理
人人都是产品经理
美团技术团队
V
Visual Studio Blog
有赞技术团队
有赞技术团队
WordPress大学
WordPress大学
Hugging Face - Blog
Hugging Face - Blog
博客园 - 司徒正美
D
Docker
宝玉的分享
宝玉的分享
小众软件
小众软件
U
Unit 42
A
About on SuperTechFans
I
InfoQ
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
F
Fortinet All Blogs
Microsoft Security Blog
Microsoft Security Blog
月光博客
月光博客
G
Google Developers Blog
The Cloudflare Blog
H
Help Net Security
B
Blog
The GitHub Blog
The GitHub Blog
T
The Blog of Author Tim Ferriss
I
Intezer
P
Privacy International News Feed
V
Vulnerabilities – Threatpost
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Cyberwarzone
Cyberwarzone
C
Cyber Attacks, Cyber Crime and Cyber Security
Blog — PlanetScale
Blog — PlanetScale
C
Cisco Blogs
Project Zero
Project Zero
腾讯CDC
Help Net Security
Help Net Security
Latest news
Latest news
A
Arctic Wolf
T
The Exploit Database - CXSecurity.com
B
Blog RSS Feed
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Hacker News
The Hacker News
P
Palo Alto Networks Blog
AI
AI
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
P
Proofpoint News Feed
J
Java Code Geeks
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC

The Next Platform: In-depth coverage of high end computing

Oak Ridge Starts Weaving Together A Quantum, Classical HPC, And AI System Stack Dell Bulks Up Hardware As AI Infrastructure Shifts To On-Premises Cisco Wins Over AI Customers With Merchant Silicon And Optics With Its IPO Done, Cerebras Can Get Back To Pushing The AI Envelope HPE Throws VM Users A Lifeline, Unifying Containers And VM Management In Cloud Stack OpenAI, Microsoft And Friends Build A Better, More Scalable Ethernet Compute And Memory Price Hikes Drive IT Spending Way Higher Sometimes, Air Is The Only Way For AI Systems To Keep Their Cool Arista Rides AI Scale Out Networks, Moves Into Scale Across, And Awaits Scale Up If You Can Make A Compute Engine, You Can Sell A Compute Engine Cleveland Clinic Simulates Large Proteins With Quantum-Centric Supercomputing Broadcom Helps CPU And XPU Makers Go Vertical With Compute Microsoft Committed To Doubling AI Infrastructure In Two Years Google Is A Full Stack AI Player, And Is Playing Well AWS Will Be An OEM, Just Like Google And Maybe Microsoft New Google Networks Tuned Up For GenAI Inference And Training Microsoft And OpenAI Remain Friends, Are Looking To Hook Up With Others AI-Driven CPU Shortage Saves Intel’s Financial Cookies The GenAI Battle Shifts From Frontier Models To Agentic Platforms With TPU 8, Google Makes GenAI Systems Much Better, Not Just Bigger Cisco Scales Out Quantum Systems With A Quantum Network Switch The Second Time Will Be The IPO Charm For Cerebras Imagine An Army Of AI Minions Handling Incident Response AI Will Soon Drive A Third Of TSMC’s Business Bechtolsheim & Friends Breathe Life Into Pluggable Optics One Last Time How HPC And AI Digital Twins Accelerate Quantum Error Correction The Embrace Of AI In Design Transforms Cadence And Its Customers Nvidia Brings The Power Of Open Source AI Models To Quantum Computing Building The Imperfect Beast For Enterprises, GPUs Need Virtualization As Much As CPUs Ever Did CoreWeave Takes As Much Financial Engineering As It Does Datacenter Design Contemplating Meta’s Homegrown MTIA Compute Engine Roadmap Most Neoclouds, Sovereigns, And Enterprises Will Buy, Not Build, Their AI Stacks Broadcom And Google Benefit Mightily From Anthropic’s Meteoric Growth Rebellions AI Rings Up The Money To Rack Up AI Inference Systems Nvidia Software Pushes MLPerf Inference Benchmarks To New Highs Broadcom Makes Its Pitch To Run Kubernetes On VMware VCF The $2 Billion Nvidia Deal With Marvell Is About A Lot More Than NVLink Fusion Classiq Says Quantum Is On Its Way, But Patience Is Needed Demonstrating The Scientific Usefulness Of Quantum Systems We Need Servers – Lots Of Servers. . . . Arm Comes Full Circle With Homegrown, AI-Tuned Server CPU Riding The Memory Boom And Trying To Avoid The Bust Data Analytics Helps Make The Mighty Lionesses Roar Driving Down The AI System Roadmap With Nvidia The Open Agentic AI World According To Nvidia Nvidia Finally Admits Why It Shelled Out $20 Billion For Groq Nvidia Says OpenClaw Is To Agentic AI What GPT Was To Chattybots IBM Unrolls Blueprint For Quantum-Classical HPC Computing Women Get Data-Driven Health Boost As The FA Tackles Sports Science Four Months Into Its Comeback, Zapata Stakes Its Claim In Quantum Software Eridu Cuts To The AI Networking Chase With High Radix Switch System HPE Works Harder And Smarter To Chase Datacenter Profits We Need A Proper AI Inference Benchmark Test How AI Is Boosting Gender Equality In High Performance Racing Custom Compute Engine Biz Growing More Than Marvell Ever Hoped Broadcom May Become The Biggest Counterbalance To Nvidia Ayar Labs Gets $500 Million To Ramp Photonics Into 2028 AI Systems With Cisco Outshift, Agentic AI Is Teed Up For the Internet Of Cognition Nvidia Sees The Light On Silicon Photonics And Maybe Optical Switching AI Servers Finally Dominate Dell’s Systems Business VAST Data: What Controls The Data Is More Important Than What Stores It So Far, Nobody Turns Tokens Into Money Like Nvidia SambaNova Pits Its Engineering Against Nvidia For Agentic AI Some More Game Theory, This Time On The AMD-Meta Platforms Deal AMD Says “Helios” Racks And MI400 Series GPUs On Track For 2H 2026 CPU-Only Compute Still Matters To A Lot Of HPC Centers Taalas Etches AI Models Onto Transistors To Rocket Boost Inference Some Game Theory On That Nvidia-Meta Platforms Partnership AI Eats The World, And Most Of Its Flash Storage The Current AI Networking Wave Will Be A Tsunami Of Money By 2027 The Memory Crunch Pinches Cisco’s Profits Only A Few AI Platforms Can Survive The Greatest AI Show On Earth Cisco Doubles Up The Switch Bandwidth To Take On AI Scale Out And Eventually Scale Up Datacenter Spending Forecast Revised Upwards – Yet Again The Twin Engine Strategy That Propels AWS Is Working Well With GenAI Turbochargers, Google Is Shifting Its Cloud Into A Higher Gear AMD Finally Makes More Money On GPUs Than CPUs In A Quarter Dassault And Nvidia Bring Industrial World Models To Physical AI TACC Explores Mixed Precision And FP64 Emulation For HPC With Horizon Robotics Will Break AI infrastructure: Here's What Comes Next Oracle’s Financing Primes The OpenAI Pump Gartner Takes Another Stab At Forecasting AI Spending Microsoft Is More Dependent On OpenAI Than The Converse Big Blue Poised To Peddle Lots Of On Premises GenAI Microsoft Takes On Other Clouds With “Braga” Maia 200 AI Compute Engines Nvidia’s $2 Billion Investment In CoreWeave Is A Drop In A $250 Billion Bucket Intel Is Still Struggling In The Datacenter, But It Could Get Better Is Nvidia Assembling The Parts For Its Next Inference Platform? TSMC Has No Choice But To Trust The Sunny AI Forecasts Of Its Customers Cerebras Inks Transformative $10 Billion Inference Deal With OpenAI By Decade’s End, AI Will Drive More Than Half Of All Chip Sales Startup Quantum Elements Brings AI, Digital Twins To Quantum Computing D-Wave Makes Gate-Model Power Move With Quantum Circuits Buy Building The Future Of Software In The AI-Native Era Arista Modular Switches Aim At Scale Across Networks, Hit Scale Out, Too NextSilicon Takes Aim At CPUs And GPUs With “Maverick-2” Dataflow Engine How HPC Is Igniting Discoveries In Dinosaur Locomotion – And Beyond Oracle First In Line For AMD “Altair” MI450 GPUs, “Helios” Racks
AWS Tunes Up Graviton5 For Agentic AI, Boosts Bang For The Buck Bigtime
Timothy Prickett Morgan · 2026-06-12 · via The Next Platform: In-depth coverage of high end computing

Back in December, the Annapurna Labs chip division of Amazon Web Services showed off a preview of its Graviton5 Arm server CPU, and we got some hints about what this chip might look like. This week, the Graviton5 is shipping in new M9g and M9gd instances, and AWS has given us some more details about the Graviton5, filling in some blanks.

Right off the bat, the block diagram that AWS showed during the re:Invent conference seven months ago was not accurate. It showed a monolithic die with 96 pairs of “Poseidon” Neoverse V3 cores. As it turns out, the Graviton5 chip is comprised of four CPU blocks, each with their own 48 V3 cores and the associated memory and I/O controllers. This looks like AWS picked up a block of the Poseidon Compute Subsystem chip that Arm Holdings created and is using in its own AGI CPU and cut it back from 64 cores to 48 cores and used the Arm die-to-die interconnect to make a Graviton5 socket with 192 V3 cores, a dozen DDR5 memory controllers, and eight PCI-Express 6.0 controllers that I think have 96 lanes and that support the CXL 3.0 memory extension protocol.

This latter bit will be important for in-memory databases or workloads that need more memory capacity than AWS can affordably put on a Graviton5 socket. (Fatter memory sticks cost increasingly more as you add capacity to the DIMM – it scales exponentially, not linearly. So you go with the skinniest memory that meets your capacity needs and you always fill all the memory slots to get maximum memory bandwidth against that capacity.)

You can see the four individual Graviton5 chiplets here:

There are four D2D interconnects linking the four chiplets into a virtual processor, and those links run at 420 GB/sec. These interconnects burn a lot of energy, but by using four chiplets, the cost of each chiplet is lower because the yield is much higher for a smaller chip then it would be for a monolithic design pushing up against reticle limits at Taiwan Semiconductor Manufacturing Co. This lower per chiplet cost is mitigated by a switch from 4 nanometer processes used with the Graviton4 chip to the much more expensive but transistor dense and more power efficient 3 nanometer process.

I estimate that the Graviton4 with 96 cores had 73 billion transistors, and for the first time AWS created a two-socket NUMA machine to get single-node performance akin to the Graviton5 that was two years into the future when Graviton4 was revealed in November 2023.

The Graviton4 had essentially the same architecture as the Graviton3, with a central, monolithic core chiplet surrounded by separate memory and I/O controller chips linked to it. Graviton3 had 64 “Zeus” V1 cores and Graviton4 had 96 “Demeter” V2 cores.

As you can see from the table above, the L1 cache and L2 cache have risen linearly with core count, but AWS has increased the L3 cache faster than the core count. The Graviton5 has 2 MB of L2 cache per core, for a total of 384 MB of L2 cache across those 192 cores, but it has 384 MB of L3 cache per core, twice that of the Graviton5.

Between the increased L3 cache, the faster clock speeds on the V3 cores and the DDR5 memory, and the D2d interconnects between the four chiplets in the Graviton5 socket radically increase the wattage of the Graviton5. I think it weighs in at around 650 watts, which means the performance per watt is half that of the Graviton4. But you get 2.4X more performance per socket with Graviton5, and even comparing the Graviton4 node with two chips in a NUMA shared memory configuration, a single Graviton5 has 25 more raw throughout with only a single chip.

This is all fair tradeoffs given the need for responsive systems to support databases and agentic AI, which need low latency more than they need low heat.

Here is how the M9g instances using Graviton5, which do not have local flash storage, stack up against their R8g and X8g predecessors based on Graviton4, which also do not have local flash in the node:

And here is how the instance pricing and performance stacks up against on demand pricing for the instance for a year:

The instance savings plan pricing for the M9g and M9gd instances (the latter adds local flash) and the X8g instances have not been announced publicly for some reason, but there is compute instance pricing, which is a little bit less aggressive on the price cuts but which is more flexible in terms of converting to other instances within datacenters and across regions. On demand pricing is shown above.

What pops out immediately is that AWS is charging about half the price for M9g instances as it does for the X8g instances, and that stands to reason since the M9g instances have a quarter or a half of the memory capacity. (It varies by instance within each family.) This is the immediate impact of the DRAM and flash memory crunch. AWS has to be less generous with flash capacity as well in the variants that have local storage. Compute is cheap, memory is not.

Here is a price/performance table across the early Graviton2 and Graviton3 instances, which were not all that impressive by comparison to the Graviton4 and Graviton5 instances, which do pack a wallop. The M9g instances deliver between 31.9 percent and 33.6 percent better bang for the buck than the most equivalent R8g instances with the same vCPU counts, and the M9gd instances with local flash deliver between 22.1 percent and 32 percent better price/performance than the R8gd instances. The high memory X8g instances are very pricey. You have to really need that large memory footprint to pick this instance.

You might be thinking that the new M9g and M9gd instances based on Graviton5 are a little skimpy on the memory compared to their R8g and R8gd predecessors using the Graviton4 CPU. The capacity on the Graviton5 instances is a quarter to a half that of the Graviton4 instances, but for bandwidth sensitive workloads this capacity may not matter as much. We strongly suspect that there will be heavier-memoried X9g instances in the not too distant future that add more memory, but expect to pay a hefty premium for that extra memory given the cost of DRAM these days. Moreover, AWS is using the fastest DRAM available in the DDR5 form factor, at 8.8 GHz speeds. These Graviton5 instances make up in bandwidth what they might be lacking in capacity.

Some of this extra memory cost might be mitigated by CXL 3.0 memory extenders, and I would not be surprised if AWS has created a shared memory appliance in its Graviton5 racks to deliver this function. I do not think CXL 3.0 memory extenders will be used in the node with PCI-Express 6.0 slots, but that is just because that is more boring than a rackscale memory appliance for extending and sharing DRAM.