惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Apple Machine Learning Research
Apple Machine Learning Research
The GitHub Blog
The GitHub Blog
Hugging Face - Blog
Hugging Face - Blog
阮一峰的网络日志
阮一峰的网络日志
爱范儿
爱范儿
量子位
宝玉的分享
宝玉的分享
人人都是产品经理
人人都是产品经理
博客园_首页
博客园 - 【当耐特】
Last Week in AI
Last Week in AI
Martin Fowler
Martin Fowler
Microsoft Azure Blog
Microsoft Azure Blog
美团技术团队
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
aimingoo的专栏
aimingoo的专栏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
GbyAI
GbyAI
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
腾讯CDC

DEV Community

I Built OpenKap — A Loom Alternative for Small Teams Who Just Want to Ship Offline-First Flutter: How We Built a CRM That Manages 100K+ Leads With No Internet Memory for Agents: When Vectors Meet Graphs, Bugs Drop 4 The Rise of Production-Grade AI Infrastructure I ran my idea-validation product through its own validator. The verdict was PIVOT. We Built an Agent Commerce API. Google I/O 2026 Changed Our 3-Month Roadmap in 24 Hours. "My Partner's Memory Was Full. I Didn't Know — Until We Tried to Talk." I’m a Front End Web Developer Learning Machine Learning From Scratch Laravel Waiting Request I Built a Chrome Extension to Track How Long You Actually Spend on Each Tab Why Google Can't See Your React Breadcrumbs (And the 4-Line Fix) AI Travel Assistant Powered by Gemma 4; With Streaming, Image Input, and Visual Recommendation Cards Microsoft tried to kill the printer driver. Healthcare said no. The Blueprint Beneath the Blueprint: Designing Data Model and Choosing Its Database REST APIs vs Webhooks in Telecom Billing - Which One Actually Makes Sense? Accounting Made Simple: AI-Powered Financial Insights of Japanese Companies with Gemma 4 The append-only AST trick that makes Flutter AI chat actually smooth Designing the Future of Payments — Why XML Still Matters in the Age of APIs From Legacy to Live — Reviving XMLPayments with GitHub Copilot Two Weeks Into Learning Solana XMLPayments — The Hidden Backbone of Modern Financial Orchestration AI Agents in Practice — Read from the beginning Reviving My Gemma Agentic Framework: From Prototype to Polished Repo Smart Contracts Demand Better Infrastructure: Building on contract.dev Self-Hosted LLM Tool Calling: Forge and the Build-vs-Buy Decision ORA-00072 오류 원인과 해결 방법 완벽 가이드 OpenWA for CTOs: Self-Hosted WhatsApp Gateway Trade-Offs NotebookLM Automation With notebooklm-py: Useful, But Classify Data First Docker v29.5.x Operator Upgrade Checklist Coding-Agent Instruction Design: The CLAUDE.md File That Prevents Rework When I Finally Realized My Runtime Was Holding Me Back GnokeOps: Host Your Own AI House Party The Death of Static Rate Limiters: Why Your Java Virtual Threads Need BBR-Style Adaptive Concurrency AI Agents in Practice — Part 2: What Makes Something an Agent Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule that fixed mine Beyond Prompts: Structuring AI Workflows for Real Frontend Engineering From an Abandoned Hackathon Project to an AI Study Workspace 🚀 Terraform with AI: Build AWS Infra (Cursor + MCP) What If AI Didn’t Need the Internet? 750,000 Chips, 140 Trillion Tokens: The Math Behind DeepSeek's Permanent Price Cut You're Renting Someone Else's Compute — And It's Costing You More Than You Think CSS :has() Selector: The Layout Trick I Wish I Knew 5 Years Ago Five Clusters. Five Lessons. One Production System. Synaptic: A Local-First AI Dev Companion That Remembers How You Think Revolutionizing Edge MedTech: Building a Sovereign Sleep Apnea Companion ("XiHan Snore Coach") with Gemma 4 HDD Eksternal Tiba-Tiba Tidak Bisa Diakses di Windows? Ini Tiga Lapis Fix-nya DMARC p=none vs p=quarantine vs p=reject: what to use and when DSA Application in Real Life: How Git Diff Works: LCS Intuition, Myers Algorithm, and Real Code Changes I solo-built a reputation layer for AI agents on NEAR — and here's what I learned I built an AI faceless video generator in 2 months — here's the stack
Gemma 4 来了:本地多模态推理的曙光
Parul Malhot · 2026-05-23 · via DEV Community

这是一个提交给Gemma 4挑战赛的作品:写一篇关于Gemma 4的文章

Gemma 4来了:本地多模态推理的黎明 🚀

多年来,开发者一直生活在分裂的AI世界中。我们拥有被API锁定的庞大、强大、专有的模型,也有本地、开源权重的模型,它们足够好了 适用于基本任务,但在复杂推理和多模态输入方面遇到了困难。

随着 Gemma 4 的发布,这种差距不仅缩小了,几乎已经消失。

Gemma 4 带来了以前仅限于前沿 API 模型的功能——多模态能力、巨大的 128K 上下文窗口,以及专门的 推理模式—直接到您的本地计算机.

在这篇文章中,我们将深入解析三种模型变体,探讨这些新功能对日常开发者的实际意义,并了解如何开始使用.


🏗️ 三种变体:哪一种适合您?

Google发布了三种不同尺寸的Gemma 4,以覆盖开发者的各种需求:

  1. Gemma 4 (Nano / 边缘类): 边缘冠军。非常适合部署在移动设备上,树莓派上,或在大型桌面应用后台静默运行,用于基本的自动补全和路由任务。
  2. Gemma 4 (标准/中端): 开发者的主力。如果你在运行MacBook Pro或配置不错的Windows/Linux机器且配备中端GPU,这是你的日常选择。
  3. Gemma 4 (大/专业级): 本地强机。需要强大的GPU配置,但提供媲美顶级模型的推理能力。

🧠 变革者:推理模式

Gemma 4最令人兴奋的功能或许是推理模式

推理模式引入了一个内部的"思考"阶段,模型在其中评估方法、自我纠正并构建其逻辑结构。之前生成最终输出.

这为什么重要:你现在可以在本地处理复杂的算法、调试和架构规划——而无需你的数据离开你的机器.


👁️ 多模态输入:看到全局

Gemma 4支持原生多模态输入:

  • 界面到代码:将 Figma 截图转换为 React/Tailwind
  • 调试:组合截图+日志
  • 无障碍访问:在本地生成替代文本

无需多个模型——这是一个统一的系统。


📚 128K上下文窗口:整个代码库时代

一个128K的上下文窗口允许你输入大量数据:

  • 整个仓库
  • 文档
  • 提交问题

该模型理解系统级架构——而不仅仅是片段.


🛠️ 本地入门

使用 Ollama 运行:

# Pull the standard variant for local dev
ollama run gemma4

进入全屏模式 退出全屏模式

Python 示例(多模态 + 推理模式)

from transformers import AutoProcessor, AutoModelForCausalLM
import torch

# Load the model and processor
model_id = "google/gemma-4-standard-it"
processor = AutoProcessor.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# Multimodal input with Reasoning Mode
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://example.com/system-architecture.png"},
            {
                "type": "text",
                "text": "Analyze this architecture diagram and output a step-by-step plan to migrate it to serverless. Enable reasoning mode."
            }
        ]
    }
]

# Process and Generate
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    enable_reasoning=True  # The magic flag
)

print(processor.decode(outputs[0]))

进入全屏模式 退出全屏模式


🔮 这对未来意味着什么

Gemma 4是一个宣言:真正的开发者自主性是可能的.

通过本地推理、视觉和海量上下文,我们消除了:

  • API成本
  • 隐私问题
  • 延迟

我们可以构建完全运行在我们硬件上的自主代理——安全地处理敏感数据和私有代码库.

前沿不再局限于遥远的数据中心.

有了Gemma 4,前沿就在你的桌面上.