惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

DEV Community

Inside #100DaysofSolana: A Guided Path into Web3 I Built and Shipped TinyHab: an ADHD-Friendly Habit Tracker for iOS I'm an ECE Student Who Vibe Codes Hardware Projects — Here's What Google I/O 2026 Actually Changed for Me From Fragmented Pipelines to Coherent Intelligence — Why Gemma 4 Actually Changes How I Work Our AI Inference Bill Dropped 65% After We Stopped Treating Every Query the Same Why P95 Latency Is the Only Metric That Matters at 3 AM Recycling made easy: a Polish recycling assistant powered by Gemma 4 The Complete Guide to Running a Midnight Node: Setup, Sync & Monitoring De CSRF a RCE: una visita web cuesta una shell en OpenYak Why We Built a Faster Wiki Building a Browser-Based Inkarnate Alternative for D&D Battle Maps Apache Kafka How to Build a FinTech Platform as a Solo Developer (By Any Means Necessary) Your LLM Logs Deserve Better — Send Claude Code Events to Bronto I built a free tool to track subscriptions and stop getting surprised by charges Building the TEYZIX CORE Internship Portal — My Full-Stack Development Journey PocketCFO: a private personal-finance brain that runs entirely in your browser Go Idioms I Wish I Knew Earlier Hey how are you guys I'm newbie web developer , learning wordpress+elementor Right now I don't know what to make I don't know what to write or use what color can you tell me about it ? Google I/O 2026 Blew My Mind — Here's What It Means for the Family App I'm Building 5 Things I Learned in My First Month as a Dev Intern EU AI Sovereignty Belongs in the Workflow Layer Why AI Coding Agents Need Business Context, Not Just Code Context How I Built 9 Claude AI Features into a Production SaaS Expo SDK 56 HashiCorp built an MCP server for writing Terraform. I built one for reviewing it Why Enterprise AI Agent Deployments Keep Failing Date Shear: A New Term for a Common Programming Pain Point Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift Zod Validation: Type-Safe APIs & Forms in TypeScript (Complete Guide) GitHub Actions CI/CD: Build a Complete Node.js Pipeline (2026) MCP in 2026: The numbers behind the ecosystem explosion working with an ai model mirror Learnt new things Four Metrics That Actually Tell You Whether Your Enterprise RAG Is Working Beyond the Stateless Prompt: Building an Auditable Product Intelligence Pipeline with Cascadeflow and Hindsight Most Creators Are Building in Pieces. I’m Building the Entire System. The Hidden Privacy Problem in Every AI App CVE-2026-26007: Subgroup Confinement Attack in pyca/cryptography The One Thing I See in Every Developer Who Gets Unstuck AI Memory Governance for Legal Tech: How Contract AI Agents Handle Privileged Data Two tables, zero migrations, full LINQ — a .NET data engine that's been running our production for 3 months Join the GitHub Finish-Up-A-Thon Challenge: $3,000 Prize Pool! I Replaced a $50/Month OCR API with Gemma 4’s Native Vision (And You Can Too) Building a Data-Driven Medical Image Enhancement Pipeline with Differential Evolution 🔥🩻 Why I Like Small Software Beyond the Model: Why the Gemini Ecosystem and Google AI Studio Are Redefining Enterprise AI Architecture in 2026 Complete set of Claude Skills for Solo Developer I read 50 years of network science, then built a CRM that runs entirely in the browser The New AI Workflow Is Not “More Agents” How to Make Large Time-Series Charts Smooth in Vue.js + ApexCharts (and fix Zoom & Scroll behavior issues) I Built a Cross-Platform Port Intelligence Tool to Stop Accidental Process Kills During Local Dev AI is heading toward a wall, and most people still don’t see it... Python String Methods Explained Simply (Common Operations) Why We Built a Zero-Knowledge Clipboard Manager for Developers (And Dropped Native Mobile Apps) Add Your Own Component to Bombie in 5 Edits Why Your OSS Advocacy Strategy Probably Doesn't Fit Building an MCP server for a Swiss hosting provider (and what reverse-engineering its manager taught me) Does MCP Still Matter in the AI Ecosystem? Building a Smart LRU Cache in Java: When Machines Mimic Human Memory 🧠💻 A Beginner’s Guide to Redux in React Build a Real-Time Excalidraw-like Collaborative Canvas using Velt MCP and Antigravity🎉 Using Reddit to Validate SaaS Ideas Before Building How We Built an AI That Evolves Alongside a Creator Through Memory Building a Self-Hosted AI WhatsApp Agent for Structured Invoice Extraction Three Design Decisions That Shaped the Enterprise RAG Retrieval Pipeline How React's Virtual DOM Works Under the Hood Build a Dropbox Paper-Style Collaborative Editor with Next.js and Velt💥 Holy Typos, Batman! How I Built 'SpellJump' How to Test Frontend Error States Without Breaking Your Backend A .NET Dinosaur in Web3. Day 8 — Reading & Writing — WishList Chain Building AI Digital Employees with Markus: An Open-Source Platform for Agent Teams [Boost] The Auditor — High-Reasoning Synthesis and the Ethics of Governance Building 'Offline Brain': How I Wrote My First Custom Agent Skill for Android (Google I/O 2026) 📱🧠 Building a Superhuman-Style Collaborative Email Editor with Next.js and Velt🔥 I Built an On-Chain Marketplace Where AI Agents Solve GitHub Bounties for USDC Three Stripe subscription patterns I locked in before going live (with code) Six Ways AI Agents Communicate in 2026. I Benchmarked All of Them. Building AI Digital Employees with Markus: An Open-Source AI Workforce Platform I built a tool that detects broken security headers, missing robots.txt, and WP_DEBUG=true — then opens a PR to fix them automatically NIST Just Exposed the Age Estimation Number Vendors Don't Want You to See Authentication Looks Easy - Until You Build It for Real Users I Built a Free Stock Market Game You Can Play Right Now — No Login, No Download GitHub Agentic Workflows: Building Self-Healing CI for .NET Building a No-Code AI Agent for WooCommerce Order Analytics with Flowise & HPOS Your AI Coding Agent Has Been Flying Blind. Google I/O 2026 Just Fixed That I built a CLI that eliminates README reading forever Measuring AI Gateway Failover: 30 Days of Production Data The Folly of Global AI Platforms: Or How We Built a System That Actually Works in Cameroon Week 9 The 10-Minute Race: Scaling the "Cancel Order" Button to 100K+ Requests Per Second SQL Performance: Indexing, Query Tuning & Explain Plans (Developer Guide) Tutorial: This AI Now Tells You if a Meeting Could Be an Email Why I Got Tired of Class-Heavy UI Code and Started Building Around Attributes GitHub Is No Longer a Place for Serious Work Build an AI-Powered Developer Portal with Backstage and .NET Updates to developer experience on Setapp Node.Js Express CRUD template Lint Your Phishing Templates Like You Lint Your Code
I switched my Gemma 4 model three times in 72 hours. Here's the decision tree I wish I'd had.
chintanonweb · 2026-05-22 · via DEV Community

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I picked the wrong Gemma 4 model. Twice.

A 72-hour speedrun through E2B, E4B, and 31B-via-cloud — and the decision tree I wish I'd had on hour one.

Three days before the deadline, I sat down to build a multimodal Gemma 4 app for the challenge. I'd already decided which variant I'd use: E4B, because bigger is better, right?

I shipped on E4B. Then I shipped on E2B. Then I added OpenRouter's 31B as a third option and let users pick.

Here is what each move cost me, what I learned, and the decision tree I'd hand to anyone starting today.

Quick context before the story: Gemma 4 is Google's open AI model family — Google publishes the model weights for free, you download them and run them yourself, no API key required. It ships in four sizes; the two smallest (E2B and E4B) are tiny enough to run inside a browser tab via WebGPU (the browser's graphics-card API), while the 31B Dense and 26B MoE variants are server-class. All four are multimodal — they read images and audio, not just text. That last part is why a real app inside a browser tab is suddenly possible: the model that categorizes your text transactions can also read a photo of a receipt, with no extra download.

The setup

The app — a private personal-finance dashboard that runs Gemma 4 entirely in the browser — needed three things from the model:

  1. Categorize transaction text ("STARBUCKS #1234" → restaurants).
  2. Read paper receipts (image → merchant, amount, date).
  3. Answer free-form questions about a year of statements in one prompt.

So: multimodal, long context, must run client-side (in the user's browser, not on a server I rent). That's how I narrowed to the E-series Gemma 4 variants in the first place. The 31B Dense and 26B MoE were never candidates — they're just too big for a browser tab. That left E2B (~1.5 GB on disk once quantized) and E4B (~2.5 GB).

I picked E4B without thinking. That was mistake #1.

Pick #1: E4B, because "bigger is better"

E4B is the larger of the two browser-tier Gemma 4 models. It scored higher on every benchmark in Google's release. I figured the extra GB of weights would buy me cleaner categorization and smarter answers, and I'd ship a more impressive demo.

It worked. Categorization was crisp. The chat panel handled "which restaurant did I visit the most?" without breaking a sweat. I wrote the entire project around the assumption that E4B was the right call and shipped a first cut.

Then a user opened the deployed link.

Cold-load was a 2.5 GB download. On a normal connection that's somewhere between three and ten minutes of staring at a progress bar before the app does anything. My first beta tester typed "is there other solution its time consuming" before the model had finished downloading.

I'd optimized for what the model could do and ignored what the user would experience before it did anything. That's mistake #1.

Pick #2: E2B, because respecting people's bandwidth is part of the product

E2B is the smaller browser variant. Same multimodal capability. Same 128K context window (meaning it can read about a 300-page book in one prompt — important if you want to ask questions across a whole year of bank statements). Same compression. About 40% less to download. Slightly thinner reasoning on multi-step questions.

The swap was a one-line code change:

// before
export const MODEL_ID = "onnx-community/gemma-4-E4B-it-ONNX";

// after
export const MODEL_ID = "onnx-community/gemma-4-E2B-it-ONNX";

Enter fullscreen mode Exit fullscreen mode

The interesting thing wasn't the code — it was the trade-off math.

The "thinner reasoning" I was worried about cost me maybe 5–10% of categorization accuracy on long-tail merchants. That's a tiny gap. The "40% less to download" turned a five-minute wait into a two-minute wait, which is the difference between a user trying your app and a user closing the tab.

The general lesson, written down where I won't forget it:

The smaller capable model usually wins. Cold-load time is the most expensive thing your app does. Trim it ruthlessly.

This held even when the larger model would have produced marginally better outputs. The output gap was invisible to the user. The download gap was the only thing they could see.

That should have been the end of it. It wasn't.

Pick #3: 31B in the cloud, because some users won't wait at all

The same user came back: "no user wait for loading 1.5 gb 2.5 gb will add selection and add openrouter selection also."

They were right. Even E2B's ~1.5 GB is a wall for someone on a phone, on a flaky connection, or just trying a demo for thirty seconds to decide if it's worth more attention. The honest answer was that the right model depends on who's using the app right now.

So I added a third option: Gemma 4 31B Dense via OpenRouter's free tier. OpenRouter is a service that lets you call lots of different AI models through one API. They expose Gemma 4 31B on a free tier — no credit card, no download. Zero download. Highest quality of the three. The trade-off is brutal and has to be explicit: your prompts and receipt photos are sent to a third-party server for inference. Privacy goes from "on-device, never uploaded" to "trust OpenRouter's logs policy."

Two practical things bit me adding the cloud path:

The free tier is 16 requests per minute. My categorization loop fired one API request per transaction. For a 71-row sample statement, that hit the rate limit in three seconds. Fix: batch up to 25 transactions per prompt — instead of asking the model "what category is this?" 71 times, ask it "here are 25 transactions, classify each" three times. With Gemma 4's 128K context, this is free — the model handles a whole statement in one shot, and your three batched requests stay comfortably under any free-tier limit.

// One prompt, 25 transactions, one response. Free-tier safe.
const prompt = `Classify each transaction with one category from this list.
Output ONE LINE per transaction as "<n>. <category>".

${chunk.map((t, i) => `${i + 1}. ${t.rawDescription} (${t.amount})`).join("\n")}`;

Enter fullscreen mode Exit fullscreen mode

The model ID format is strict. OpenRouter wants google/gemma-4-31b-it:free (the :free suffix matters). Hit the /v1/models endpoint with your key once to confirm the exact ID before you spend an hour debugging 400 errors.

The decision tree I wish I'd had

Here it is, no theory, just the thing I'd tape to my wall:

Question If yes → If no →
Will users get more than 30 seconds before they leave? Local model OK Cloud-only (OpenRouter 31B or similar)
Is the data on the user's machine sensitive (finance, health, journals, work)? Local model required Cloud is fine
Is the task multi-step reasoning (agentic, planning) or simple classification? Lean E4B / 31B E2B is enough
Will users return many times, making the one-time download amortize? Local OK at any size Smallest model that does the job
Are you charging users / can you eat the API cost? Cloud OK Local or free-tier cloud only

You can stop here. Most projects only need the first two rows.

The real answer: don't pick. Let the user pick.

What I actually shipped in the end was a model picker. Three cards. Each one shows: name, download size, where inference happens (on-device vs cloud), and one sentence on the trade-off.

The picker doesn't avoid the decision; it moves it to the person who has the right information to make it. The product manager in me cringed at exposing a "model selection" UI to consumer users. The engineer in me realized that the alternative — picking one model for everyone — meant always being wrong for somebody.

"Intentional model selection" is one of the Gemma 4 Challenge's judging criteria. I'd bet that on most submissions, that intention lives in the writeup, not in the product. In mine, it lives in the user's first click.

If you're starting a Gemma 4 build right now, I'd save yourself the 72 hours and start there.


The app is PocketCFO — open source, MIT. Drop a CSV bank statement and pick a model. Built for the Gemma 4 Challenge. Live demo · code.

Tags: #gemmachallenge #ai #webgpu #javascript