惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

aimingoo的专栏
aimingoo的专栏
量子位
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
S
Schneier on Security
Cisco Talos Blog
Cisco Talos Blog
T
ThreatConnect
J
Java Code Geeks
博客园 - 司徒正美
A
Arctic Wolf
T
True Tiger Recordings
C
Cybersecurity and Infrastructure Security Agency CISA
Cyberwarzone
Cyberwarzone
Know Your Adversary
Know Your Adversary
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
Recorded Future
Recorded Future
P
Palo Alto Networks Blog
The Hacker News
The Hacker News
The Register - Security
The Register - Security
S
Securelist
www.infosecurity-magazine.com
www.infosecurity-magazine.com
C
CXSECURITY Database RSS Feed - CXSecurity.com
Application and Cybersecurity Blog
Application and Cybersecurity Blog
I
Intezer
P
Privacy & Cybersecurity Law Blog
Scott Helme
Scott Helme
K
Kaspersky official blog
博客园 - 聂微东
Last Week in AI
Last Week in AI
V
V2EX
小众软件
小众软件
F
Fox-IT International blog
Martin Fowler
Martin Fowler
Apple Machine Learning Research
Apple Machine Learning Research
T
Tenable Blog
F
Future of Privacy Forum
Microsoft Security Blog
Microsoft Security Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
腾讯CDC
Stack Overflow Blog
Stack Overflow Blog
C
Check Point Blog
阮一峰的网络日志
阮一峰的网络日志
GbyAI
GbyAI
T
Threatpost
I
InfoQ
P
Proofpoint News Feed
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
T
Tor Project blog
G
GRAHAM CLULEY
D
DataBreaches.Net

Hacker News - Newest: "AI"

Your AI Evaluation Is Biased — By Design This big university system is embracing AI. Students and faculty aren't all on board An AI Interface for Research Papers Agentic AI Changes the CPU/GPU Equation Deconstructing Cognitive Overload: Deep Self-Understanding Ubers COO says its getting harder to justify the money spent on AI tokenmaxxing GitHub - bitomule/musts: The validation loop that stops AI coding agents from claiming work is done before it actually is. CoworkGuard — Runtime Visibility for AI Tools Is AI flattening your team’s creativity? Here’s how to tell. Feynman - AI research assistant SynapCores — the AI-native database GitHub - Noumenon-ai/AutoMaxFix: Controlled AI repair loop. Audit → Reproduce → Patch → Test → Report. Safety boundaries most AI agents skip. Show HN: Hackobar – One feed for AI news GitHub - agentpatterns-ai/website: Website content for agentpatterns.ai Torvalds Tightens Linux Kernel Rules to Reject Deluge of Low-Value AI Fixes Anthropic's Olah says AI must be guided from outside Big Tech How to get your team past the AI coding plateau The Stepford AI PhoneDiffusion App - App Store Anthropic Billionaire Cofounder Joins Pope Leo, Warns AI Job Losses Will Spark "Moral Imperative Of Historic Proportions" GitHub - kian9375/seoclaw-by-kb-software: Open source AI SEO optimizer CLI — made by KianBot.ai Credential Brokering for AI Agents, Explained | Infisial Linus Torvalds Is Unhappy About the AI Influence in Linux Kernel Development Plain Markdown | Webpage to Markdown Browser Extension Grappling with AI Margin Points - Arnold Engel GrillKit – self-hosted AI technical interview trainer with voice Pope Leo’s Unsettling Vision of the AI Future One Endpoint. Zero Credentials. Eight Confirmed Vulnerabilities. Repolog — SEO, Performance, Security & AI Readiness audits An AI-generated film premiered at Cannes The uncritical adoption of AI in science is alarming — we urgently need guard rails Microsoft just banned its own engineers from using AI twitter.com GitHub - sovseal/core: Zero-Knowledge memory for AI Agents Not All On-Device AI Is The Same: How Chip Compute Tiers Decide What Your Product Can Actually Do – Easelink Tech RCF Protocol – license layer to protect code semantics from AI replication Pope Leo XIV says AI must serve humanity, not the powerful few Do you review AI generated code differently based on where it is in your code? Amazon launches new AI Wearable "Bee" bilibili Ask HN: Do you embrace AI in your life and business? Mnemosyne — The Zero-Dependency AI Memory System 21 Free Agentic AI Design Patterns for Developers (2026) Google is cannibalizing the web to feed AI Silicon Valley takes its AI pitch to the pope How to scan for vulnerabilities with GitHub Security Lab’s open source AI-powered framework AI Model Idle · 인공지능 키우기 @levelsio (@levelsio) America's plutonium puzzle: from cold war relics to AI ambitions AI can chart a course to disaster faster than humans can notice Final Fantasy Creator Call AI-Generated Final Fantasy 6 Remake Video 'Amazing' Pope Leo Compares AI Threat to Biblical 'Tower of Babel' Faster Than We Can Patch Pope Leo denounces ‘culture of power’ driving rise of AI Pope Leo Issues AI Encyclical Warning Against 'Opaque Algorithms' Pope Leo’s ‘Magnifica humanitas’: AI must serve humanity not concentrate power The AI Era Is Creating a Bug Hunting Arms Race The AI-Native Developer – Queue Show HN: An open-source, interactive AI engineering syllabus (1,100 papers) 教皇利奥警告称,应防止人工智能“统治人类” Mark Zuckerberg's Right-Hand Man Who's Unleashing AI at Meta GitHub - Espenandreass1/agentslice: A Markdown workflow kit that makes Cursor, Claude Code, Codex and Windsurf ask before they edit. Show HN: I Built a Debugging Challenge for the AI Coding Age Gemma 4: A new, budget-focused model in Posit AI Pope Leo warns AI revolution driven by ‘idolatry of profit’ My AI agent called my code shit and took an unannounced vacation mid-sprint HTML Deployer: 1-Click AI Code To Website Publisher - Chrome 应用商店 College Kids Don't Want Your AI [video] How I Used AI to Untangle a Legacy Service I'd Never Touched Before — The AI Leverage Weekly Greetings, Class of 2026 Have You Heard About AI? Wait, Why Are You Booing? AI guardrails stripped from Meta and Google models in minutes Uvora Growth OS – AI marketing automation and lead generation platform The Essential Cloud for AI: Why Purpose-Built Defines the Future of Intelligence No, AI is not making software worse, people are - Raphael Amorim If you let AI do your writing, I will come to your house and kill you Why The AI Boom Is Reshuffling The Global Stock Market Hierarchy AI Makes Adding Features Faster - So Why Not Add Just One More? Ask HN: How to get back into programming without AI? How Claude's AI model may cause security issues for your money Kevin O'Leary wants to build a massive AI data centre in Utah. Some residents aren't happy My AI coding flow was burning tokens to do things code should do Show HN: Live AI music sequencing agent The Dark Between the Stars GitHub - lynote-ai/humanize-text: Free open-source AI text humanizer to convert AI-generated content into undetectable, human-like writing. Bypass Turnitin, GPTZero, and all major AI detectors. No sign-up required. Try our unlimited free online tool Sign in Nobody Wants AI Anymore [video][12 mins] AI Has Taken Over Open Source How to Teach AI the "Taste" Global AI Diffusion: Q1 2026 Trends and Insights [pdf] HN: Silau – AI detects employee burnout" How AI Talks People Out of Conspiracy Theories–and What We Can Learn from That What to know about the AI models that are jolting Washington AI for design needs solving | by Megha Agrawal Client Challenge Predicting AI job exposure — Benedict Evans Google has seriously leaned into AI enshittification lately AI is becoming increasingly unpopular AI-Driven Design Automation What's Left for AI-Assisted Coding GitHub - Totes-MickGOATs/mcgoats-game-template: AI-powered game development template with CI/CD, auto-merge queue, TDD enforcement, 3-layer master protection, and 50+ skills for Godot/Unity/Unreal
AI Datacenters Were Built for GPUs — Almartis
AlassaneSaka · 2026-05-26 · via Hacker News - Newest: "AI"
← Back to Blog

Infrastructure · May 2026

AI Datacenters Were Built for GPUs.
What Happens When You Remove the GPUs?

Date May 2026

Reading time ~10 min


For the past few decades, building a datacenter has been a well-understood, predictable exercise in utility engineering. You provisioned compute servers, attached storage arrays, and built a network to stitch them together. The objective was straightforward: maximize utilization while minimizing cost.

The dominant traffic pattern was fundamentally north-south (clients sending requests to servers, and servers responding with database queries) and a few east-west traffic from servers to storage. The networks were built to handle bursty traffic, and if a packet dropped, standard TCP/IP would retransmit it. In web hosting or cloud services, a minor delay meant an image loaded slightly slower or a request completed a few milliseconds later. It was tolerable.

AI training changed that model completely. The network is no longer infrastructure. It directly determines accelerator utilization.

In modern AI clusters, the network is no longer just infrastructure sitting beneath compute. It is not simply transporting data between machines but determines accelerator utilization.

If you are training large models under the deep learning paradigm, you aren't dealing with independent servers. It is rather a massive, distributed supercomputer where thousands of GPUs must continuously swap parameters. The dominant traffic pattern shifts completely to east-west traffic (server-to-server, GPU-to-GPU and rack-to-rack) communication inside the cluster. In contrast to localized, bursty spikes, AI workloads execute communication patterns like all-to-all and all-reduce.

Instead of millions of small independent flows, the network must carry a small number of extremely large elephant flows. During gradient synchronization phases, thousands of GPUs may simultaneously exchange data across the fabric, creating severe network incast and rapidly saturating switch buffers.

This shift broke many of the assumptions standard networking was built on. When a modern accelerator can consume and generate data at 800 Gb/s, the critical metric flips from average latency to Job Completion Time (JCT) and tail latency.

In deep learning training, workloads execute in tightly synchronized steps. Meaning the entire workload progresses at the speed of the slowest participant.

One delayed packet can stall thousands of GPUs.

All-to-All Traffic incast and elephant flows diagram Figure 1: Synchronized elephant flows causing switch buffer saturation.


Solving packet loss created a new problem: head-of-line blocking.

The sensitivity to packet delay is amplified by the transport layer AI clusters rely on. Modern distributed training heavily uses RDMA through RoCEv2 (RDMA over Converged Ethernet), allowing GPUs to bypass the CPU and operating system entirely for low-latency direct memory access across GPUs. But while RoCEv2 dramatically reduces overhead, it is also highly sensitive to packet loss. A single dropped packet can trigger retransmissions, timeout cascades, and synchronization delays across the cluster.

To achieve loss tolerance, standard RoCEv2 networks rely on Priority Flow Control (PFC). Conceptually, PFC acts like a pause mechanism: when switch buffers begin filling, the switch instructs upstream devices to temporarily stop transmitting traffic.

But this creates another problem: head-of-line blocking.

PFC solves packet loss by propagating congestion backward through the network. Under sustained load, this creates head-of-line blocking, where unrelated traffic becomes trapped behind congested flows. Congestion spreads across the fabric, queue depths increase, and entire sections of the network can become effectively synchronized around the slowest traffic path.

In distributed training environments, this is expensive. The compute cluster cannot advance until every synchronization operation completes. GPUs remain idle while waiting for retransmitted packets or congested flows to clear.


The incumbent answer: InfiniBand and Rail Optimization

To maximize GPU utilization, the industry's immediate answer was to throw hardware at the problem. NVIDIA capitalized on this by dominating the AI datacenter landscape with InfiniBand — a native lossless fabric designed specifically for high-throughput, low-latency clustering. Unlike conventional Ethernet deployments, InfiniBand was built around deterministic transport behavior, hardware congestion management, adaptive routing, and tightly controlled latency characteristics.

To scale these clusters, engineering teams have had to navigate three distinct network vectors:

  • Scale Up: Maximizing the high-speed interconnectivity within a single chassis or node (e.g., stitching 8 GPUs together using NVLink).
  • Scale Out: Expanding horizontally by connecting these multi-GPU nodes across an entire data hall using a dedicated backend network fabric.
  • Scale Across / DCI (Datacenter Interconnect): Linking entire clusters together when physical power and cooling limits prevent a single site from expanding further.

Scale-out vs Scale-up Figure 1: Scale-up is for memory & Scale-out is for compute

We're entering the end of scale-up as NVIDIA now delivers complete racks with every GPU accessing every other GPU's memory through NVLink (on the same chassis) and NVSwitch (in the same rack). The next years will consist of focusing on using Connect-X NICs for connecting different racks.

To manage the massive scale-out fabric, modern topologies are rigidly designed to be rail-optimized. In an 8-GPU node configuration, each of the 8 GPUs is mapped to a dedicated, independent network interface card (NIC). The network fabric is split into 8 parallel, isolated physical switch planes. GPU position 1 across every server communicates exclusively through rail 1, GPU position 2 through rail 2, and so on.

This isolation reduces congestion interactions and improves failure containment. If one network plane experiences degradation, the cluster loses only a fraction of aggregate bandwidth rather than stalling the entire distributed workload.

Rail-optimized topology Figure 1: 2-Tier Rail-optimized topology.


Static routing was designed for mice not elephants.

Rail-optimized architectures exposed another weakness in conventional networking.

Traditional routing protocols cannot handle this architecture efficiently. Standard IP networks rely on ECMP (Equal-Cost Multi-Path) to distribute traffic across paths. ECMP works by hashing the packet's header (static 5-tuple) to assign a flow to a specific path. In web applications this works extremely well because traffic consists of large numbers of relatively small independent flows.

AI traffic behaves differently because distributed training creates a small number of massive elephant flows. ECMP hashing inevitably creates collisions where multiple large flows become pinned to the same physical links while alternative paths remain underutilized. The result is buffer pressure, more congestion, packet drops and tail latency spikes.

To counter this, modern AI switches utilize DLB (Dynamic Load Balancing) and packet-spraying mechanisms. Instead of routing by flow, the hardware breaks elephant flows apart, and schedules traffic dynamically based on real-time port congestion.

This is the environment that led to the emergence of the Ultra Ethernet Consortium.


An open re-architecture of Ethernet for AI workloads.

InfiniBand works, but it is expensive, closed, and forces vendor lock-in. The broader ecosystem's response is the Ultra Ethernet Consortium (UEC): a comprehensive re-architecture of Ethernet designed specifically to challenge InfiniBand on AI workloads, without giving up Ethernet's vast ecosystem and economies of scale.

Instead of relying on crude, flow-level pause mechanisms like PFC, Ultra Ethernet moves the intelligence to the transport layer. It natively introduces Packet Spraying: rather than forcing an entire elephant flow down a single hashed path via ECMP, UEC switches chop the flow down to individual packets and scatter them across every available link in the fabric simultaneously.

This naturally introduces out-of-order packet delivery, so Ultra Ethernet incorporates hardware-level packet reordering at the NIC layer. It also pushes toward mechanisms like Virtual Output Queueing (VOQ), where packets are buffered based on final destination rather than competing broadly for shared output queues. The goal is to minimize head-of-line blocking, reduce congestion propagation, improve load balancing, and stabilize tail latency under synchronized east-west traffic.

InfiniBand

Ultra Ethernet

Native lossless fabric

Proprietary, vendor lock-in

Hardware congestion management

PFC-based flow control

High cost, closed ecosystem

Open Ethernet ecosystem

Multi-vendor interoperability

Transport-layer intelligence

Packet spraying + VOQ

Scale economies of Ethernet


GPU-free AI Datacenters

In many ways, both InfiniBand and Ultra Ethernet are attempting to solve the same fundamental problem: the communication overhead imposed by large-scale distributed deep learning.

Modern AI systems distribute enormous parameter spaces across thousands of independent accelerators. Keeping those systems synchronized requires sophisticated networking architectures, specialized transport behavior, and large power budgets dedicated purely to coordination overhead.

The complexity of modern AI infrastructure is not accidental. It is downstream of the computational assumptions the models themselves impose.

This is where we believe a different architectural direction becomes interesting. At Almartis, our work explores associative memory systems built around explicit, addressable, and deterministic memory structures rather than large-scale distributed tensor optimization. Instead of relying primarily on statistical approximation across billions of continuously synchronized parameters, the architecture emphasizes structured retrieval and compositional memory operations.

That changes the infrastructure profile significantly. Rather than optimizing around giant all-reduce domains and synchronization-heavy GPU clusters, the system can optimize around memory locality, deterministic retrieval, low-overhead east-west communication, and integrated storage-compute fabrics operating directly over Ethernet.

This allows us to flatten the physical datacenter into a GPU-free, non-blocking, 1-tier full mesh architecture built around high-density CPU nodes and 51.2Tb silicon switching fabric. Storage and compute operate within the same physical domain rather than existing as separate backend and frontend systems. Ultra Ethernet principles such as packet spraying and dynamic load balancing are still valuable, but the objective changes fundamentally.

AI GPU Cluster vs ZERO GPU AI Cluster Figure 1: 1-Tier Rail-Only AI GPU Cluster vs ZERO GPU AI Cluster

In a perfect world, GPU clusters should be 1-tier. Additionally for better scale-out, some researchers found that GPU traffic is mostly deterministic (GPU 1 of server x mostly talks to GPU 1 of other servers). We could then remove the spine in rail-optimized topologies and when GPU 1 of server A wants to send data to GPU 2 of server B, it will simply copy the data to GPU 2 on server A who will transmit it. Scale-up starts right at the chassis itself up to the entire rack.

Considering a 1-tier rail-only cluster and latest NVIDIA GPU generations, the limit of the tier is 216 Blackwell Ultra GPUs. While consuming more than twice our GPU-free cluster, this GPU cluster is insignificant for training capable LLM models (labs are using hundreds of thousands of GPUs to train models for months).

It's difficult to compare these two breeds of systems, as one is an LLM (training and inference) and the other is more a World Model (continual learning). But our 150-kW cluster can train a system from scratch to common sense (understanding of objects, making sense of the physical world, context awareness and the ability to learn anything from there.)


The goal is no longer maximizing throughput but it is minimizing retrieval latency.

The past several years of AI networking have largely been defined by one central challenge: how to scale synchronization between accelerators efficiently enough to keep increasingly large GPU clusters utilized.

The goal is no longer maximizing synchronized accelerator throughput across massive, distributed GPU fabrics.

It becomes minimizing retrieval and coordination latency across structured memory systems.

That distinction matters.

The next generation of AI infrastructure may ultimately depend on a different question: what happens when the architecture itself reduces the need for synchronization in the first place?

Almartis · May 2026

A GPU-free, 1-tier, non-blocking full mesh architecture built for associative memory.