惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
The Exploit Database - CXSecurity.com
D
Darknet – Hacking Tools, Hacker News & Cyber Security
V
Vulnerabilities – Threatpost
Blog — PlanetScale
Blog — PlanetScale
G
Google Developers Blog
M
MIT News - Artificial intelligence
C
Cybersecurity and Infrastructure Security Agency CISA
MyScale Blog
MyScale Blog
P
Privacy International News Feed
MongoDB | Blog
MongoDB | Blog
Know Your Adversary
Know Your Adversary
P
Palo Alto Networks Blog
AWS News Blog
AWS News Blog
Cisco Talos Blog
Cisco Talos Blog
Malwarebytes
Malwarebytes
aimingoo的专栏
aimingoo的专栏
T
Threat Research - Cisco Blogs
Last Week in AI
Last Week in AI
量子位
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
C
CERT Recently Published Vulnerability Notes
Hugging Face - Blog
Hugging Face - Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Y
Y Combinator Blog
L
LangChain Blog
L
LINUX DO - 热门话题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
V
Visual Studio Blog
S
Security @ Cisco Blogs
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
O
OpenAI News
N
News | PayPal Newsroom
Stack Overflow Blog
Stack Overflow Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
V
V2EX - 技术
李成银的技术随笔
Recent Announcements
Recent Announcements
C
Check Point Blog
Attack and Defense Labs
Attack and Defense Labs
Recent Commits to openclaw:main
Recent Commits to openclaw:main
S
Security Archives - TechRepublic
小众软件
小众软件
博客园 - 聂微东
月光博客
月光博客
GbyAI
GbyAI
T
Troy Hunt's Blog
S
Securelist
雷峰网
雷峰网

DEV Community

Understanding Optimistic Preloading in Modern Applications Nobody Wants to Read Your Code (And You Don't Want to Read Theirs) A clothing pairing app I built an AI app store screenshot generator because Figma made me cry — looking for brutal feedback Hello DEV Community — My Developer Journey Begins Adaptable apps on ChromeOS: a post-mortem The WordPress Paradox: Why It’s Here to Stay (and How to Stop Ruining It) I built a local voice AI that can change to 9 different personalities! UXRay: I Built an AI That Roasts Your UI Like a Senior Designer Would Wyrly DI: Type-safe Dependency Injection for Modern TypeScript The contract is the interface: agent-driven Steampipe Stave in one command Gemma 4's Hidden Superpower: Why Built-in Thinking Tokens Change Everything for Evaluation Tasks ⚡ WordPress Performance: The Real Truth They Don't Tell You A Mobile App Usually Needs an Admin System First Customer Portals Should Remove Repeated Admin Work Episode 4: The Time Loop (Layers & Caching) I Built ContextForge with Gemma 4: A Project Memory Generator for Developers and AI Coding Agents Why shadow DOM beat iframe for inline tooltips HOW TO CREATE USER AND ASSIGN ROLES IN AZURE WITH ENTRA ID When AI Blackmail Goes Viral Episode 3: The Secret Scroll (The Dockerfile) Monte Carlo Simulation for Engineers: Turning Uncertainty Into Numbers The tokens-per-byte trap: character-level 'compression' adds tokens Nobody Reads Your Code Anymore Why I built a collection of 5 free, zero-signup career finance tools for solo builders 🚀 New React Challenge: Instant UI with useOptimistic Resolvendo a Alucinação da IA na Arquitetura de Software com Code Property Graphs e .NET 9 S1 — Clean Backtrace Crashes: How to Diagnose and Fix Them Cómo solucionar el bucle infinito en useEffect con objetos y arrays The Brutal Reality of Running Gemma 4 Locally I made Claude Code refuse to write code unless the ticket scores 80/100 I Fed React's Entire Hooks Transition History to Gemma 4. Here's What It Found That We Missed. Building a Private RAG System: Lessons from a Local-First AI Journal CodePulse AI — Reviving an AI-Powered Repository Intelligence Platform How to Split Video into Segments with FFmpeg (CLI + API) I've audited dozens of estate agency websites. The same 5 problems show up every single time. Part 1: Taming Asynchronous JavaScript: How to Build a "Mailbox" Queue Building My AI-Powered VS Code Extension 🚀 Google Login in Express with PassportJS & JWT Great example of Gemma 4 moving beyond chatbots into real-world decision support. Using AI to guide everyday actions like recycling shows how impactful applied LLMs can be when designed for usability, not just capability. #Gemma4 #AI #Sustainability Building a Production AI Chatbot for an Educational Institute: Architecture, Lessons & Full Stack Deep-Dive Google Login in Express with PassportJS & JWT How I reclaimed 47GB on my MacBook by cleaning developer project junk Operators Are Not Oracles: How We Learned to Stop Worrying and Love the Configuration I Built 6 Free Developer Tools for AI APIs, Cron, Docker, and Self-Hosting How I Built a Real-Time Precious Metals Price Feed for 30,000 Concurrent Users in Laravel How to Use a SERP API to Validate Whether a Project Idea Is Worth Building Gemma 4 discussions often focus on capability, but real-world impact depends on deployment context. For offline education, especially in low-connectivity regions, latency, cost, and local inference matter as much as model strength. Local Mind Explores it Space Complexity + Ω and Θ Notations Google I/O 2026 Just Confirmed the Shift From AI Chatbots to AI Agents How to Add API Monitoring to an Express App in 5 Minutes (2026) Designing an In-Game Inflation Tracking Algorithm for Web Utility Apps Google AI Studio Just Changed the Shape of App Development If you struggle to learn then this is for you. Best AI Agent Security & Guardrails Tools in 2026: LLM Guard vs NeMo vs Guardrails AI Building Dynamic RBAC in React 19: From Permission Strings to Component-Level Access Control How to Build a Self-Hosted AI Code Review Tool in Python Why We Switched from React to HTMX in Production: A 200-Site Case Study Gemma-Loom: The Intent-Based Virtual Machine (IVM) for Edge Sovereignty Java实习海投攻略:3天300个沟通,我是怎么拿到面试的 I Deployed Netflix's Web Server in 30 Seconds (And So Can You) - Docker Project 1 Debugging Android 14 WebRTC Disconnects on a coturn Relay Path 1/30 Days System Design Question Testing FastAPI + SQLAlchemy with Real PostgreSQL Fixtures: No More Mocking Misery FAQ Schema Markup Generators: What They Actually Do (and What They Don't Tell You) How a pure-TypeScript flex layout engine closed the last WASM-Yoga gap Spot instances as GitHub Actions runners Agents Need Receipts, Not Just Better Prompts readmegen — Generate beautiful README.md in seconds (12 templates, open source) When AI Reads Blueprints: The Hidden Attack Surface of Multimodal Engineering Intelligence Simplicity scales — complexity kills side projects AI does exactly what you ask — that's the problem How a model upgrade silently broke our extraction prompt (and how we caught it) The Best Form Backend for Static Sites in 2026 # ⛽ I Built a Cross-Platform Fuel Finder with React & Supabase: The Indie Dev Journey The 11 Major Cloud Service Providers in 2025 Membangun Karya Visual: Mengintip Fasilitas Multimedia dan Studio Kreatif Amikom What Is IOPS? Visualizing Database Design: From Interactive Canvas to Drizzle, Prisma, and SQL in Real-time A tool to make your GitHub README impossible to ignore 🚀 Zero-Downtime Blue-Green and IP-Based Canary Deployments on ECS Fargate I reproduced a Claude Code RCE. The bug pattern is everywhere. We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found. Jenkins CI/CD Pipeline for a Dockerized Node.js Application: Manual Trigger vs Automatic Trigger Using GitHub Webhooks How to Stream Live Forex Rates to Google Sheets API: A Complete Guide Small Models Will Beat Giant Models (And Most People Haven’t Realized Why Yet) How I Built 5 Linux Automation Scripts on AWS EC2 I built TokenPatch to measure AI coding cost per applied patch I built a Chrome extension to stop squinting at the web Producer audit clean, six tests red Conversa — A Multi-Agent AI Platform Powered by Gemma 4 Build a Real Agent in 15 Minutes with Gemini's New Managed Agents API What I Actually Build: AI Systems That Ship, Not Demos That Impress The Box Ticked While You Read This: LinkedIn, AI Training, and the Switch You Did Not Flip Investasi Masa Depan: Mengintip Fasilitas Laboratorium Komputer Kelas Dunia di Yogyakarta I Cancelled My $20 Claude Cowork Plan After a Week With OpenWork Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead How To Build an Image Cropper in Browser (Simple Steps) I built a macOS disk cleaner for developers and just launched it would love feedback Membangun Kompetensi dan Relasi: Mengapa Ekosistem Kampus Itu Penting
E2B vs E4B vs 31B Dense: The Practical Guide to Choosing the Right Gemma 4 Model
pulkitgovran · 2026-05-23 · via DEV Community

This is a submission for the Gemma 4 Challenge: Write About Gemma 4


I want to tell you about the moment I stopped treating local AI as a compromise.

I was building a UX analysis tool. The idea was simple: give it a screenshot, get back a structured critique grounded in Nielsen's heuristics, cognitive load theory, WCAG — the real frameworks, not vibes. I'd used cloud models before. They work. But they also mean your users' screenshots leave their machine, hit an API, and cost money per call.

So I pulled Gemma 4 E4B locally via Ollama, pointed it at Stripe.com, and waited.

Fifty-six seconds later it handed me a structured JSON object with an overall UX score, three specific friction points each mapped to a Nielsen heuristic, accessibility flags, layout analysis, and prioritized recommendations with effort and impact ratings.

On CPU. No GPU. No API key. No data leaving the machine.

That's when the compromise framing broke for me. This isn't a fallback. It's a capability shift.


What Gemma 4 Actually Is

Gemma 4 is Google's latest open-weights model family, and the headline feature is multimodality — it can see images, not just read text. But the more interesting story is the range of what Google shipped:

Variant Parameters Context Window Best For
E2B ~2B 8K Edge, mobile, fast inference
E4B ~4B 8K Local development, CPU-viable
31B Dense ~31B 128K Complex reasoning, long documents

Three very different tools with the same name. Picking wrong costs you either quality or speed. Let me tell you what I've actually learned about each.


E2B: The One That Fits Anywhere

The 2-billion-parameter model is built for constraints. Think browser extensions, mobile apps, Raspberry Pi, anything where you cannot afford to wait or cannot carry a heavy runtime.

It handles basic instruction-following and short-form generation well. What it struggles with is structured output — complex JSON schemas, multi-step reasoning chains, tasks that require holding a lot of context simultaneously. If your task is "summarize this paragraph" or "classify this text into one of five categories," E2B is excellent and blazingly fast. If your task requires the model to reason across multiple frameworks while producing a nested JSON object, you'll feel the ceiling.

Use E2B when: speed > depth, you're running on constrained hardware, or the task is classifying/summarizing rather than reasoning.


E4B: The Sweet Spot for Developers

This is the one I use. The 4-billion-parameter model is the first point in the family where you stop noticing the limitations on most practical tasks.

What changed for me: E4B can follow a complex, multi-section JSON schema reliably. I give it a system prompt that defines eight nested fields — cognitiveLoad, trustScore, frictionPoints, recommendations, accessibilityFlags, layoutAnalysis — and it fills them correctly, with real observations, not hallucinated filler.

It also handles format: "json" mode in Ollama properly. This flag constrains the output to valid JSON, and E4B respects it while still producing coherent, specific content. E2B can struggle here — the output is valid JSON but the reasoning quality inside the strings drops noticeably.

On a modern CPU (M-series Mac or reasonably recent x86), E4B runs in under two minutes for most tasks. That's usable. It's not instant, but it's not "go make a coffee" either. On a GPU it's much faster.

Use E4B when: you need structured output, multimodal input, or reasoning quality that holds up in a real product — and you want it to run locally without a GPU.


31B Dense: When You Need to Think Hard

The 31B model is a different kind of tool. The 128K context window alone opens up use cases the smaller models simply cannot touch: full codebase analysis, long-document reasoning, multi-turn conversations that span thousands of tokens without losing thread.

The reasoning depth is also qualitatively different. Tasks that require synthesizing across many pieces of information — legal document review, detailed technical explanations, nuanced comparative analysis — land differently at 31B than at 4B. The model doesn't just answer; it considers.

The tradeoff is infrastructure. You need a GPU (or significant RAM for CPU inference), and the latency is real. This isn't a laptop model. But if you're building something that needs serious reasoning and you can provision the compute, 31B Dense is competitive with much larger proprietary models on a lot of benchmarks.

Use 31B Dense when: you have the compute, your task requires deep reasoning or long context, and quality is the primary constraint.


The Feature That Changed How I Build: Multimodal + JSON Mode

Here's the combination that makes Gemma 4 genuinely special for application developers.

Most LLM applications are text in, text out. You send a prompt, you parse the response, you deal with whatever format the model decided to use that day. Gemma 4 changes two things at once:

1. You can send images. Not OCR'd text from images. Not bounding boxes. The actual image. The model reasons about what it sees — colors, layout, spatial relationships, visual hierarchy.

2. You can enforce JSON output. With format: "json" in the Ollama request, the model outputs valid JSON. Combined with a well-defined system prompt that describes the exact schema you want, you get structured, machine-readable output you can render directly in a UI.

Here's what that looks like in practice:

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemma4:e4b",
    prompt: systemPrompt + "\n\n" + userPrompt,
    images: [base64Image],  // raw base64, no data URI prefix
    format: "json",         // constrain output to valid JSON
    stream: true,
    options: {
      temperature: 0.3,     // lower = more deterministic structure
      num_ctx: 8192,
    },
  }),
});

Enter fullscreen mode Exit fullscreen mode

The model receives the image, reasons about it, and returns a structured JSON object. No parsing heuristics. No regex. No "hope the model uses the format I asked for." Validate it with Zod or any schema library and you're done.

One practical note: sometimes the first pass at temperature: 0.3 produces a malformed response, especially on complex schemas. A simple retry at temperature: 0.1 with streaming disabled almost always fixes it:

// Retry pattern for malformed JSON
const retryResponse = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  body: JSON.stringify({
    model: "gemma4:e4b",
    prompt: systemPrompt + "\n\n" + userPrompt,
    images: [base64Image],
    format: "json",
    stream: false,          // non-streaming for retry
    options: { temperature: 0.1, num_ctx: 8192 },
  }),
});

Enter fullscreen mode Exit fullscreen mode

This two-pass pattern makes structured multimodal output production-reliable.


Getting Started in Five Minutes

If you want to run Gemma 4 locally right now:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull the E4B model (~3GB download)
ollama pull gemma4:e4b

# Test it immediately
ollama run gemma4:e4b "Explain the difference between cognitive load and visual noise in UX design"

Enter fullscreen mode Exit fullscreen mode

For multimodal input via API:

# Send an image (base64 encoded)
curl http://localhost:11434/api/generate -d '{
  "model": "gemma4:e4b",
  "prompt": "Describe the visual hierarchy in this UI screenshot",
  "images": ["'$(base64 -i your-screenshot.png)'"],
  "format": "json",
  "stream": false
}'

Enter fullscreen mode Exit fullscreen mode

That's it. No API key, no rate limits, no billing dashboard.


What This Actually Means

I think we're underreacting to what's happening here.

A 4-billion-parameter multimodal model that runs on a consumer laptop, follows complex structured output schemas, reasons about images, and produces output good enough to use in a real product — that's not a prototype capability. That's a production capability.

For developers, this changes the build calculus for a whole category of applications. Anything that involves user data you'd rather not send to a cloud API. Any tool that needs to work offline. Any product targeting users who are rightfully skeptical about where their data goes. Any hobby project where you don't want to watch a usage bill grow.

For the broader ecosystem, a capable open-weights multimodal model means researchers, educators, and developers in places with expensive or unreliable internet access can run serious AI locally. The capability is no longer gated behind an API key and a credit card.

The question I keep coming back to: if a 4B model running on CPU can do this now, what does the roadmap look like in two years? We're probably not near the ceiling.

I'm not sure what that means yet. But I'm fairly sure the "local AI as compromise" framing is already wrong, and most people haven't noticed.


Running Gemma 4 E4B locally via Ollama. Models are open-weights and available at ollama.com/library/gemma4.