惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - Franky
L
LINUX DO - 最新话题
Y
Y Combinator Blog
WordPress大学
WordPress大学
D
DataBreaches.Net
GbyAI
GbyAI
MongoDB | Blog
MongoDB | Blog
宝玉的分享
宝玉的分享
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
V
Visual Studio Blog
AI
AI
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
Engineering at Meta
Engineering at Meta
Martin Fowler
Martin Fowler
阮一峰的网络日志
阮一峰的网络日志
C
Check Point Blog
Help Net Security
Help Net Security
N
News and Events Feed by Topic
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Google Online Security Blog
Google Online Security Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Schneier on Security
Schneier on Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
博客园 - 三生石上(FineUI控件)
Google DeepMind News
Google DeepMind News
N
Netflix TechBlog - Medium
W
WeLiveSecurity
G
Google Developers Blog
Cloudbric
Cloudbric
Attack and Defense Labs
Attack and Defense Labs
罗磊的独立博客
TaoSecurity Blog
TaoSecurity Blog
Spread Privacy
Spread Privacy
C
CXSECURITY Database RSS Feed - CXSecurity.com
小众软件
小众软件
Latest news
Latest news
S
Secure Thoughts
L
LangChain Blog
Know Your Adversary
Know Your Adversary
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Forbes - Security
Forbes - Security
C
CERT Recently Published Vulnerability Notes
P
Privacy International News Feed
雷峰网
雷峰网
Cyberwarzone
Cyberwarzone
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
博客园 - 司徒正美
V
Vulnerabilities – Threatpost

NVIDIA Newsroom

NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure NVIDIA and SK hynix Announce Multiyear Technology Partnership to Advance Memory for AI Factories SK Telecom and NVIDIA Build AI Infrastructure to Power Korea’s AI Innovation NAVER Expands AI Infrastructure With NVIDIA to Serve Surging Global AI Demand NVIDIA, KRAFTON, NC and Reigning ‘League of Legends’ Champions T1 Celebrate RTX Spark at Korea’s PC Bangs Seoul Purpose: How NVIDIA and South Korea Are Building the Future of AI Forecast: Fun Ahead — 18 Games Join in June to Stream on GeForce NOW NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent Training at Scale NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI Industrial Software Leaders Build Secure, Autonomous AI Engineers With NVIDIA NemoClaw NVIDIA Partners With Microsoft on Unified Stack for Agentic AI Deployment, From Windows Devices to Cloud to Local Why Financial Institutions Are Converging on Transaction Foundation Models to Build Their Own Intelligence NVIDIA Jetson Brings Agentic AI to the Physical World NVIDIA AI Cloud Ecosystem Expands Worldwide to Meet Global AI Compute Demand NVIDIA Factory Operations Blueprint Gives Factories a New AI Brain Taiwan’s Industry Titans Turbocharge World’s AI Infrastructure Buildout With NVIDIA NVIDIA and TSMC Bring AI Into Fabs to Advance Semiconductor Design and Manufacturing NVIDIA, Foxconn and Taiwan Medical Centers Bring Agentic and Physical AI to ‘Healthy Taiwan’ NVIDIA Releases Major Collection of Open Source Agent Tools and Skills for Physical AI NVIDIA Announces NVIDIA Isaac GR00T Reference Humanoid Robot for Academic Research NVIDIA DRIVE Hyperion Becomes the Global Platform for a Robotaxi-Ready World NVIDIA Launches Alpamayo 2 Super Open Reasoning Model for Robotaxis How Cosmos 3 Helps Physical AI Think Before It Acts NVIDIA Launches Cosmos 3, the Open Frontier Foundation Model for Physical AI NVIDIA DGX Station for Windows Puts a Trillion-Parameter AI Supercomputer on Every Enterprise Desk NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark NVIDIA and Microsoft Reinvent Windows PCs for the Age of Personal AI Enterprise Software Leaders Build AI Agents With NVIDIA NVIDIA Unveils Vera, the CPU for Agents NVIDIA Vera BlueField-4 STX Brings Agentic AI Storage Processing With In-Silicon Security NVIDIA Vera Rubin Ramps Into Full Production to Power Agentic AI Factories Worldwide NVIDIA DSX Gives Infrastructure Builders the Playbook for AI Factories NVIDIA Research Advances Robotics From Simulation to the Real World The Name’s Gaming … Cloud Gaming: ‘007 First Light’ Launches on GeForce NOW NVIDIA GTC Taipei at COMPUTEX: Live Updates on What’s Next in AI NVIDIA CEO Jensen Huang at Dell Technologies World: ‘Demand Is Going Parabolic, Utterly Parabolic’ Linked and Loaded: Gaijin Single Sign-On Now Available on GeForce NOW NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived Tag, You’re It: GeForce NOW Levels Up Game Discovery With Xbox Game Pass and Ubisoft+ Labels Making Sense of the Early Universe From Rainforests to Recycling Plants: 5 Ways NVIDIA AI Is Protecting the Planet NVIDIA and Google Cloud Collaborate to Advance Agentic and Physical AI Autonomous AI at Scale: Adobe Agents Unlock Breakthrough Creative Intelligence With NVIDIA and WPP No Need for Space Gear — Capcom’s ‘PRAGMATA’ Joins GeForce NOW on Launch Day Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters New Adobe Premiere Color Grading Mode Accelerated on NVIDIA GPUs Strength and Destiny Collide: ‘Samson: A Tyndalston Story’ Arrives in the Cloud National Robotics Week — Latest Physical AI Research, Breakthroughs and Resources From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI Press Start on April: GeForce NOW Brings 10 Games to the Cloud Efficiency at Scale: NVIDIA, Energy Leaders Accelerating Power‑Flexible AI Factories to Fortify the Grid Into the Omniverse: NVIDIA GTC Showcases Virtual Worlds Powering the Physical AI Era Game On: Five New Titles Now Streaming on GeForce NOW The Future of AI Is Open and Proprietary Blowing Off Steam: How Power-Flexible AI Factories Can Stabilize the Global Energy Grid Advancing Open Source AI, NVIDIA Donates Dynamic Resource Allocation Driver for GPUs to Kubernetes Community How Autonomous AI Agents Become Secure by Design With NVIDIA OpenShell NVIDIA's CEO Projects $1 Trillion in AI Chip Sales as New Computing Era Begins Nvidia CEO: We have the most energy efficient architecture in the world An Interview with Nvidia CEO Jensen Huang About Accelerated Computing NVIDIA GTC 2026: Live Updates on What’s Next in AI Smooth Moves: 90 Frames-Per-Second Virtual Reality Arrives on GeForce NOW From Simulation to Production: How to Build Robots With AI More Than Meets the Eye: NVIDIA RTX-Accelerated Computers Now Connect Directly to Apple Vision Pro
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
Michael Fukuyama · 2026-06-11 · via NVIDIA Newsroom

Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation. NVIDIA has optimized DiffusionGemma to run even faster across NVIDIA GeForce RTX GPUs, the NVIDIA RTX PRO platform and NVIDIA DGX Spark systems, from local PCs to the cloud. 

Rather than generating text one word at a time, DiffusionGemma generates multiple words in parallel to output whole blocks of text, opening a new, low-latency frontier for the kind of single-user workloads that developers, researchers and AI enthusiasts run every day. 

Features of the new model include: 

  • Parallel generation: DiffusionGemma denoises up to 256 tokens per step instead of predicting one at a time. 
  • Built on Gemma 4: DiffusionGemma is built on Gemma 4, a 26-billion-parameter mixture-of-experts model that activates just 3.8 billion parameters per step, pairing a diffusion head with Google’s Gemma 4 architecture. 
  • Up to 4x faster performance: The boost means fast text generation, where single-user generation usually stalls — on local hardware. 
  • Open and local: DiffusionGemma is open weights under a permissive Apache 2.0 license and runs entirely on RTX and DGX Spark — no cloud, no per-token cost — with day-zero support in Hugging Face Transformers, vLLM and Unsloth. 

A Different Way to Generate Text 

Almost every large language model (LLM) in wide use today is autoregressive — meaning it generates text one token at a time, with each new word depending on the one before it. That sequential process is what makes interactive AI feel like it’s typing. 

DiffusionGemma takes a different path. Built on the Gemma 4 26B mixture-of-experts architecture, it generates text the way diffusion models generate images: by starting from noise and refining a whole block of text at once. Each step denoises up to 256 tokens in parallel rather than emitting a single token and waiting to compute the next. 

The result is a model that thinks in blocks instead of sequentially. For latency-sensitive, single-user work — such as interactive chat, agentic loops or on-device assistants that plan and act — that parallelism translates into responses fast enough to keep pace with how developers think and iterate.

DiffusionGemma Flies on NVIDIA GPUs 

Generating one token at a time is fundamentally a memory-bound problem — a traditional LLM spends most of its time waiting on memory bandwidth, not doing math, which leaves a lot of compute on the table. 

Diffusion flips the equation. Pulling a full 256-token block through the transformer in parallel is a compute-bound workload — exactly what NVIDIA GPUs are built for. NVIDIA Tensor Cores accelerate the dense parallel math, and the CUDA software stack lets the model run efficiently from day one without bespoke tuning. In short, the model’s design plays directly to the GPUs strengths. 

DiffusionGemma delivers 1,000 tokens/sec on a single NVIDIA H100 Tensor Core GPU150 tokens/sec on NVIDIA DGX Spark and fastest local inference on NVIDIA DGX Station  roughly 4x faster than an equivalent autoregressive model running in the same single-user regime. 

That advantage holds across NVIDIA’s full lineup, running: 

  • Locally on the NVIDIA DGX Spark deskside personal AI supercomputer — powered by the NVIDIA GB10 Grace Blackwell Superchip with 128GB of unified memory — with the preinstalled NVIDIA AI software stack ready for prototyping, fine-tuning and fully local agent workflows. 
  • On NVIDIA RTX PRO 6000 workstations, providing developers, researchers and AI professionals with the headroom to run local low-latency generation and agentic loops as part of a professional workflow. 
  • On DGX Station, delivering best-in-class, high-speed inference at up to 800 tokens/sec for low-latency text generation and agentic loops with 748GB of coherent memory. 
  • On GeForce RTX GPUs, with llama.cpp support coming soon. 

The fastest way to start testing and prototyping the model is through Hugging Face Transformers, which runs DiffusionGemma on a GeForce RTX 5090 or DGX Spark out of the box. For higher-throughput inference, vLLM provides day-zero serving support.  

For adapting the model to a specific task or domain, fine-tuning is available through Unsloth and NVIDIA NeMo framework, with ready-made DGX Spark playbooks to get a local environment running quickly. Check out the vLLM playbooks for DGX Spark , RTX PRO and DGX Station. 

Try Diffusion Gemma on Hugging Face or test it for free using NVIDIA-hosted application programming interfaces at build.nvidia.com. 

Go deeper on the architecture and local deployment by reading the NVIDIA technical blog and the Google DeepMind announcement.

#ICYMI: The Latest From RTX AI Garage 

🎬 NVIDIA researchers released SANA-WM, an open source world model that turns a single image and a camera path into a minute-long, 720p video with precise 6-DoF control. At just 2.6 billion parameters, its distilled version generates a full 60-second clip in 34 seconds on a single NVIDIA GeForce RTX 5090 GPU using the NVFP4 format — delivering up to 36x higher throughput than comparable open models while running on one GPU. Read the paper. 

🛠️ Building Windows agents just got a full toolset — NVIDIA and Microsoft rolled out turnkey agent sandboxing on native Windows — Microsoft eXecution Containers plus the NVIDIA OpenShell runtime — alongside up to 2x faster agentic inference and native Windows support for Hermes Agent. 

🤖DGX Spark goes from unboxing to a running agent in minutes — A streamlined NVIDIA NemoClaw install gets developers to a working local agent fast, with Qwen3.6-35B running up to 2.6x faster on vLLM. And the new cluster assistant in NVIDIA Sync links up to four DGX Spark units into one 512GB pool — enough for ~400-billion-parameter models. 

Plug in to RTX Spark on FacebookInstagramTikTok and X — and stay informed by subscribing to the RTX Spark newsletter. 

See notice regarding software product information.