惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

量子位
C
CXSECURITY Database RSS Feed - CXSecurity.com
Project Zero
Project Zero
O
OpenAI News
C
Cisco Blogs
Microsoft Azure Blog
Microsoft Azure Blog
Security Latest
Security Latest
T
Tor Project blog
S
SegmentFault 最新的问题
P
Privacy & Cybersecurity Law Blog
博客园 - 【当耐特】
V
Vulnerabilities – Threatpost
W
WeLiveSecurity
小众软件
小众软件
博客园 - 聂微东
Y
Y Combinator Blog
Spread Privacy
Spread Privacy
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
Scott Helme
Scott Helme
B
Blog RSS Feed
N
News | PayPal Newsroom
J
Java Code Geeks
T
The Blog of Author Tim Ferriss
TaoSecurity Blog
TaoSecurity Blog
D
Docker
阮一峰的网络日志
阮一峰的网络日志
NISL@THU
NISL@THU
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
L
LINUX DO - 最新话题
MongoDB | Blog
MongoDB | Blog
Recorded Future
Recorded Future
Webroot Blog
Webroot Blog
L
Lohrmann on Cybersecurity
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
L
LangChain Blog
Cloudbric
Cloudbric
罗磊的独立博客
宝玉的分享
宝玉的分享
Jina AI
Jina AI
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
N
News and Events Feed by Topic
GbyAI
GbyAI
大猫的无限游戏
大猫的无限游戏
A
About on SuperTechFans
L
LINUX DO - 热门话题
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC

Google adds end-to-end Gmail encryption to Android, iOS devices for enterprises – Computerworld

Why I’m leaving Copilot for Gemini EU's cloud sovereignty push leaves room for US hyperscalers Tech industry cut 38,242 jobs in May, worst since 2024 Why Apple may be winning again Anthropic suggests slowing AI research until we can align it with human goals Why Waymo settled for the wrong car Microsoft makes Linux developers feel more at home in Windows with Coreutils release Apple to open its first developer center in Europe Asana launches AI ‘chief of staff’ to keep projects on track What Safari reveals about Apple’s AI strategy ahead of WWDC AI saves workers a day a week, but they don’t know what to do with it Google brings local AI agents to laptops with Gemma 4 12B Compliance chaos: NY regulators see a data breach — then focus on IT errors Eu sets out plans to reduce reliance on US cloud providers After a quick 1.1M sales, MacBook Neo set to reshape the PC industry Enterprise Spotlight: Rethinking cloud strategy in the age of AI - Whitepaper Repository - Enterprise Spotlight: Rethinking cloud strategy in the age of AI - Whitepaper Repository - RTX Spark may split the AI PC market into mainstream laptops and premium workstations A retro-geeky Android home screen remix Microsoft 365: A guide to the updates The AI pricing conundrum — it started as a nightmare, now it’s worse. Apple’s M1 MacBook Air refuses to die Microsoft unveils Scout, an autonomous AI agent built on OpenClaw Windows 11 Insider Previews: What’s in the latest build? Intel stakes new claim in physical AI with robotics chips IBM unveils tool to track sovereignty risks for cloud workloads Windows 11 Smart App Control explained WWDC: What can developers expect? Why AI can’t match human creative work How to protect Windows 10 and 11 PCs from ransomware Open source Euro-Office productivity suite to launch June 9 AI hiring monoculture is delivering racial bias at scale WWDC, Apple, and AI: Waiting for the gift Meta considers becoming a hyperscaler Certifiably random: Swiss researchers claim perfect random number source Stop buying Motorola Android phones Q&A: Box CEO embraces shift to ‘headless’ software in the agentic AI era Democratizing AI adoption with Tether's Bitnet LLM fine-tuning framework $11 billion reasons Apple’s App Store tax is worth paying AGI could be here in three years, says DeepMind CEO All major AI models violate EU regulations — study Developers on H-1B face a tighter job market as AI shifts hiring priorities Apple’s iPhone satellite ambition goes beyond rescuing hikers Total Android recall: Never lose an important notification again Windows 11: A guide to the updates The AI tech job slaughter gets real The big winner in Elon Musk’s suit against OpenAI and Microsoft — hypocrisy Another IT governance headache: AI-enabled sanction evasion Apple opens its post-Quantum encryption vault FAQ: What you need to know about expiring Windows Secure Boot certificates Microsoft cheat sheets: Dive into Windows, Office, and Copilot ECB warns banks of new AI risks Q&A: How video helps build robot brains for physical AI With AI, typing's out, talking's in Microsoft, EY to spend $1 billion on helping customers buy agentic AI Workday extends Sana AI to ITSM after HR, finance The AI that cracked Apple Silicon is only the beginning Meta says goodbye to those who won't use AI EU moves forward on $5.8B scale-up fund to keep startups from leaving The world of AI tokens — and why they matter Microsoft refreshes Surface line with biz-friendly features – and a high price tag Do Apple's accessibility efforts point at its AI plans? Google focuses on autonomous AI agents in Gemini 3.5 Flash Beth Tschida takes over at Jamf as AI transforms Apple in the enterprise Copilot Chat: Your hub for document creation and analysis 10 Android Circle to Search superpowers you probably never noticed Google talks ‘singularity’ while scaling up agentic AI for enterprises The Big Four accounting firms are now hiring more AI specialists than accountants Arxiv: Researchers who submit AI-generated junk could get 1-year suspension Coming Bright Up: Apple's AI moment looms How Apple turned circular manufacturing into a competitive edge Why ‘open AI’ models are gaining ground on LLMs EnterpriseClaw wants to bring governance to the OpenClaw era Microsoft to retire ‘Together Mode,’ its virtual meeting space for Teams 5 ways to curb AI sprawl without stifling innovation For May, Patch Tuesday means 139 updates — but no zero-days Here’s one career emerging from the AI shift: ‘forward-deployed engineers’ Why Apple needs Intel — and America needs them both Microsoft business software faces UK antitrust probe over bundling, AI lock-in The trouble with emotion-reading AI Apple’s App Store model for AI How Southwest Airlines is putting endpoint operations on autopilot Nearly every enterprise is investing in AI, but only 5% say their data is ready Jobs lost to AI could reappear elsewhere — and solidify AI-focused roles Cyberattack: First they come for Foxconn, then they come for you 8 critical questions about the Googlebook, Android, and ChromeOS Who’s the winner in the new Microsoft-OpenAI deal? WWDC: From NeXTStep for Apple to Apple’s next step for AI Arm’s software chief sees human language as the new way to program IMF warns of the potential for AI attacks on global financial systems The European Commission eyes rules to restrict US cloud services Apple needs to fix admin authentication in ABM No hire, no fire: Employers get picky on tech skills amid AI disruption Apple vs. social engineering: Terminal paste trap blocked AI clones: the good, the bad, and the ugly EU lawmakers strike provisional deal to soften AI Act WWDC 2026: How Apple can take a great leap in AI Chrome's AI features can take up to 4GB of space on your computer ServiceNow continues its AI transformation with an integrated experience Apple Intelligence hype cost the company $250M
Google unveils DiffusionGemma, an AI model that breaks free of left-to-right processing
by Taryn Plumb · 2026-06-13 · via Google adds end-to-end Gmail encryption to Android, iOS devices for enterprises – Computerworld

Rather than generating text word by word, Google's experimental open-source model drafts entire passages simultaneously using diffusion, resulting in up to 4x faster inference.

Extremely powerful large language models (LLMs) still operate as though they’re typing on a keyboard, processing workloads in a simple left-to-right fashion. But in locally-run, single-user scenarios, this sequential processing can leave graphics processing units (GPUs) and tensor processing units (TPUs) underutilized.

Google is betting that DiffusionGemma can get around this bottleneck. The new experimental open model generates text “exceptionally fast,” creating entire blocks of text simultaneously through diffusion techniques rather than through token-by-token processing. The company says this technique results in 4x faster inference compared to auto-regressive models that rely on sequential processing.

It can also save users money. Technology analyst Carmi Levy noted that existing pay-per-token monetization models “penalize the use of less than optimally efficient AI solutions.”

But DiffusionGemma “could herald a new generation of task-defined, efficient solutions that can enable expanded compute capacity without draining the operations budget,” he said.

A contrast to left-to-right processing

Built on Google’s Gemma 4 family and its Gemini Diffusion research, DiffusionGemma is a 26B mixture-of-experts (MoE) model designed to maximize text output generation.

It essentially shifts how models use hardware, giving processors a larger hunk of work each cycle so it can draft full 256-token paragraphs in sequence. This allows the model to generate text up to 4x faster on GPUs, Google claims. It activates only 3.8B parameters during inference, and, when quantized, can fit within 18GB VRAM on high-end consumer GPUs like Nvidia RTX 5090.

“It upgrades your model inference from a single, sequential typewriter to a massive printing press that stamps the entire block of text simultaneously,” Google research scientists Brendan O’Donoghue and Sebastian Flennerhag wrote in a blog post.

AI image generators begin with pure, random ‘visual noise’ and iteratively refine that into a finalized picture (what’s known as ‘diffusion’); DiffusionGemma applies this same process to text. It does not generate tokens in order, but begins with a “canvas of random placeholder tokens” that it processes in multiple passes, identifying the context tokens it feels are most relevant and using those to refine the rest.

The model has the ability to self-correct, using confidence scoring to re-evaluate tokens in the next pass. “The model iteratively refines its own output, allowing it to evaluate the entire text block at once to fix mistakes in real-time,” O’Donoghue and Flennerhag explained.

DiffusionGemma also has bidirectional attention, they wrote. “Generating 256 tokens in parallel with each forward pass allows every token to attend to all others.” This can be particularly helpful in domains that are non-linear in nature, such as mathematical graphs, code infilling, and in-line editing, they said.

DiffusionGemma is optimized across Nvidia’s hardware stack, making it compatible with consumer setups as well as with high-performance enterprise systems like Hopper and Blackwell.

Because it is released under the Apache 2.0 license, developers can freely use, modify, distribute, and commercialize the software using their preferred tools. It can be run on GPUs or in the cloud through Google Cloud Model Garden or Nvidia NIM, and is available on Hugging Face, GitHub, and vLLM, with support for the open-source library llama.cpp coming soon.

Key use cases

The model is particularly useful in local workflows that are “speed critical,” such as generation of non-linear text structures, and unlocks what Google calls “new patterns of model behavior” like multimodal understanding and generating and rendering code in near real-time.

Levy explained, “DiffusionGemma is particularly well suited for interactive coding and editing where its efficiency allows rapid processing and iterations,” noting that its ability to fit within 18GB of VRAM and its deployability on commonly available local GPUs can potentially benefit customer service-related workloads that lean heavily on real-time interaction and local processing.

“DiffusionGemma also incorporates a thinking mode that is especially adept at problem solving,” he said. For instance, the model was fine-tuned to play Sudoku, a typically challenging task for autoregressive models because each token depends on future tokens. This “rather handily” illustrates the model’s capability to solve more complex problems, Levy noted.

Limitations

Google freely admits that DiffusionGemma is geared to specific workflows, and there are “key trade-offs.”

The model is engineered for small batch size inferencing and low-latency, high-speed generation low-to-medium batch sizes on a “single capable accelerator.”

In high-QPS cloud serving environments, (where infrastructure is designed to handle tens or hundreds of thousands of requests per second with ultra-low latency), DiffusionGemma’s parallel coding “offers diminishing returns,” and can even result in higher serving costs, Google conceded. In addition, its overall output quality is lower than that of standard Gemma 4, which is built for apps demanding maximum quality.

However, Levy noted that while DiffusionGemma “can be less precise than other models in certain workloads,” subsequent refinement cycles could overcome this limitation.

While Google isn’t sharing runtime costs, it’s clear that this is an efficiency play, he added. “When deployed across the kinds of workloads that would optimally benefit from its architecture, DiffusionGemma seems to have the potential to reduce processing overhead and related costs,” he said.

This article originally appeared on InfoWorld.

SUBSCRIBE TO OUR NEWSLETTER

From our editors straight to your inbox

Get started by entering your email address below.