惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

OptiLearn - Powered by Google Gemma 4 Aura — The Gemma 4 Powered Agentic Web Copilot & Self-Healing Accessibility Engine I built a tool that catches misleading charts using Gemma 4 running locally Worklog companion with Gemma4 GBase: Building LLM Agents That Actually Learn from Their Mistakes Blossom — a small step toward student mental wellbeing WordPress Performance Monitoring: A Complete Guide Principal Components in TypeScript (Part 4) When three sharp wallets agree: what consensus signals on Polymarket actually mean I Built a Fail-Fast Rust Scheduler with Background OAuth Auto-Refresh (Part 2) Sharing is caring How Putting Faces (Literally) to My AI Garden Images Gave It a Personality Sofi Log #001: Thailand's Tourism Tax & the 180-Day AI Surveillance Wall Sofi Log #006: Decentralized IP-Address Obfuscation Specs Sofi Log #008: Bypassing Legacy Cross-Border Bank Fee Traps Secret Rotation Automation: The Operational Cost of Security Sofi Log #009: Portable Identity & DID Passport Framework Sofi Log #011: Autonomous Smart Treasury Repatriation Specs History of Linux & Unix I asked Claude if my plan was on track for the goal — and got an honest 'No' PHPStan 'expects X, Y given' — the trace it doesn't give you Using Gemma4 2B to Assist Community Health Workers Open-source Playwright wrapper that passes bot.sannysoft.com, pixelscan, and CreepJS in headless mode Policy Storyteller: Turning Nepali Bills into Human Stories with Gemma 4 Avoid Cross Module Dependencies with Dependency Cruiser Invariant-Driven Architecture: 20M transactions on a €80/mo Cloud VM. Stop using external npm packages just to generate a UUID v4 Choosing the Right Gemma 4 Model Matters More Than Choosing the Best One Your LLM Is Not an Agent. Your Framework Is Not Enough. You Need a Harness. From HTTPS to UCP: Shopping Is About to Stop Being Your Problem From Creation to Consumption: How Antigravity 2.0 and Gemini Spark Are Defining the Agentic Era 10 Mistakes I Wish I Knew Before Taking the CKA Exam AI That Actually Does Stuff: Autonomous Agents Explained Exploring AI workflow Orchestration: Comparing Weft, Python & Alternative Pipeline Approaches El Poder del Aprendizaje Federado: Cuando los Algoritmos Distribuidos Entrenan a la IA Email Marketing Automation in 2026: 5 Tools (and 1 Self-Hosted) Through Their APIs A Replay Runbook For Missed Publishing Windows Why timeout handling matters more than most backend logic How I Make $6,800/Month Selling Niche VS Code Extensions Model Routing Cost Checklist: Hosted APIs, Open Models, Or Self-Hosted Inference? ORA-00207 오류 원인과 해결 방법 완벽 가이드 Deno 2.8 Operator Upgrade Checklist: CI, Lockfiles, Node Compatibility, And Rollback AI-Discovered Vulnerabilities Need A Triage Queue, Not A Panic Channel AI Agent Workboards Need Audit Controls Before They Need More Agents Demystifying DevRel: What It Actually Is (And Why Should You Become One?) Your AI, Your Device, Your Data - Introducing Aide Gemma 4 GenAI Coach - GenAI Concepts Made Easy with an Interactive Playground QuietPulse - Mood Tracker Principal Components in TypeScript (Part 3) The pgAudit Attribution Gap: Why Role-Level Logging Fails GDPR and How to Close It Gemma 4 CAD Orchestrator I built a local Postgres triage co-pilot because HIPAA says I can't paste plans into ChatGPT or Claude Live Holographic Editor In Fractal Time Everbench: A document management system with Local Intelligence Instanton in Fractal Time The Hidden Features of Claude How I Built an AI News Brief with Next.js, Supabase, Vercel, and GPT-4o-mini How We Built a Multi-Agent AI Documentation System (And What We Learned) I got tired of writing post-mortems — so I built RCAi for SREs MIA: A Futuristic AI Desktop Assistant Built with Voice, Gestures, and Controlled Chaos Best Programming Language for Backend Web Development: PHP vs Python PayPal Alternatives for Indian Businesses: Best Payment Gateways for International Card Payments (2026) Gemma 4 Made Me Rethink Local AI: Not Just Text, But Images Too Clean Architecture in .NET Explained (The Dependency Rule) I Compiled Rust to WebAssembly and Made My JavaScript 6 Faster Outlook.com Is the Final Boss of 'Just Send an Email' Conditional Statements and Control Flow in Python Insults & Cutlasses, Local LLM Sword Fighting on Melee Island Production Lab: ECS Fargate + Prometheus + Grafana + Loki + Alloy + Node Exporter How 12 AI agent frameworks handle human approval (most badly) The Four-Index Reality: Why AI Search Isn't One Thing I Scanned 1 Million AI Services. Here's What Worries Me More Than the Vulnerabilities Managing multiple docker hub accounts using docker-use System Design Interview: Decentralized Web Crawler Metric Cardinality: High or Low? 4 Steps to Making the Right Choice 로컬 LLM 셋업 가이드 (v23) GEO vs SEO in 2026 — What Google's May Guidance Changed Cursor Review 2026 — Honest 'Not For Me' Take From a VSCode User Hello from rikuq — a practitioner blog for solo AI SaaS founders Why DevOps Engineers Need Practical Tutorials, Not Just Theory AI Agents in CI/CD: Give Them Context, Not Production Authority Now I See Why Translators Are Panicking Over AI—Should Coders Panic Too? Why I Track HRV Every Morning (And How It Actually Changes My Day) Diffusion Language Models: How NVIDIA's Nemotron-Labs DLM Is Killing Token-by-Token Generation Chatbots GPT pour le support client : ce que les équipes françaises ont réellement besoin de savoir I Hit the 1,232-Byte Wall So You Don't Have To Google Just Rebuilt the Search Box (Again) — But This Time It's Different Aether: A local Android assistant built with Gemma 4 BoxAgnts Introduction (1) — Out of the Box mkdev: trusted HTTPS for localhost, mapped by name Just one question, one answer. Why Java Still Rules the Programming World in 2026 Four Architectures for Letting Claude Edit Elementor (and Why We Shipped Clone-and-Mutate) yard-yaml 0.1.1: safer UTF-8 handling for YAML documentation I Built a Mac App That Keeps Your Clipboard in Sync Across All Your Android Devices Stop Using UUIDs: Why B2B SaaS Needs ULIDs in Laravel 🐘 I'm a non-technical founder who built a Slack approval tool. Here's what actually broke first. Open-Sourcing Our Game AI Stack — SDKs, Templates, and CLI Tools for NPC Dialogue I Built an AI System That Makes 1,000 Decisions a Day. Here's Where I Drew the Line. Lets Encrypt DNS Challenge with Traefik and AWS Route 53
Gemma4 Challenge
Jowi A · 2026-05-25 · via DEV Community

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

The G Factor

Welcome to The G Factor, with Gemma as your host for tonight

In psychometrics there is a beautiful, slightly controversial idea called the g factor. The short version: across wildly different mental tasks (vocabulary, spatial puzzles, arithmetic, pattern matching) people who do well on one tend to do well on the others, and statisticians can squeeze that shared variance into a single number. One latent "general intelligence" that quietly predicts performance everywhere. 🧠

I built a browser app that makes a 2-billion-parameter model write music, and I named it The G Factor on purpose. Not as a cute pun (although it is one, three times over), but because the name is the whole argument I want to make: you do not need a giant model to look generally capable across a diverse range of tasks. You need a small model in the right harness.

The name is the thesis

The name pulls triple duty, and each meaning maps onto something the app actually does:

  • The G in Gemma, the model doing all the work, fully on your machine.
  • The g factor of psychometrics, one capacity stretched across many different tasks.
  • The "Factor" in X-Factor, because the app is literally a talent show where you judge contestants. 🎤

That middle meaning is the one I keep coming back to. Lay the psychometric idea next to the app and it lines up almost suspiciously well:

Psychometric g factor The G Factor (the app)
One latent capacity that predicts performance across diverse tasks One small Gemma model performing across diverse musical tasks
A battery of varied subtests A bracket of 4 to 8 contestants
The individual subtests 8 musical axes (polyrhythmic, polyphonic, modulated, timbral, harmonic, tempo-shifted, sparse, dense)
The examiner scoring each response You, judging two contestants head to head
Adapting the test to the test-taker A session taste memory that learns what you like
"Is it generally capable?" "A 2B model is enough when the runtime carries the structure"

The interesting question in intelligence research was never "how big is the brain." It was "where does general capability actually come from." That is exactly the question I find myself asking about small language models, so I built a music app to chase it.

So what is it? 🤔

The G Factor is a browser-native live-coding companion for Strudel. There is no server doing the thinking. Gemma 4 runs in your tab on WebGPU, generates Strudel patterns, and you play them out loud!

If you have not met Strudel before: it is a free, open-source environment for making music by writing code, and it runs entirely in the browser. You type small JavaScript-like snippets such as s("bd hh sd hh") and it loops them back as a beat, rewriting the sound the instant you change the code. It is a web port of the TidalCycles live-coding tradition, and it is a great target for this experiment precisely because the "language" is compact, composable, and instantly audible: you hear a wrong note the moment it plays.

There are two ways in. In the Rehearsal Room you chat with Bleep, a cartoon producer who rewrites the track turn by turn ("add a four-on-the-floor kick", "make the hats busier", "give it some reverb"). In the Talent Show you drop a seed and Gemma fields a bracket of contestants, each told to explore a different musical axis, and you judge them two at a time until one is left standing. Both surfaces feed the same taste memory.

A Talent Show semifinal: two Gemma-generated contestants going head to head

Chatting with Bleep in the Rehearsal Room

That is the demo. The part worth writing about is why a 2B model can do this at all.

Teaching a model a language it never saw

Here is the catch that makes this a real problem and not a toy: Gemma 4 almost certainly never saw Strudel during training. It is a niche live-coding DSL. So how do you get reliable, playable code out of a small model for a language it does not know?

You stop asking the model to know things, and you let the runtime carry the structure. Three layers do that, and this is the pattern I think transfers to any small-model-on-an-unfamiliar-domain problem.

The three-layer teaching stack: static priors and session taste feed Gemma, the parser firewall guards the output

Layer 1: static priors

A roughly 600-token system prompt that is the documentation the model never read: Strudel's mini-notation operators, the common method chains, and about 10 canonical idioms. This is not fine-tuning and it is not a vector database. It is a cheat sheet pinned to the front of every request. Cheap, deterministic, and it does most of the work.

Layer 2: session taste

Every time you like a pattern, the app writes {seed_code, variation_code, transformation_label} into IndexedDB. On the next generation it scores your past likes against the current seed with a character-bigram Jaccard similarity, takes the top 3, and injects them as a labelled "this user has previously liked..." block.

That is the "learns your taste" claim, and it is honest: no weights move, no GPU time, no API call. The model adapts to you the way the psychometric test adapts to the test-taker, by feeding it the right few-shot context at the right moment. Cold start works on priors alone, and the experience just gets warmer the more you use it. And when a contestant wins its bracket, that head-to-head-verified preference is exactly what gets written back into the taste memory:

A crowned Talent Show champion, saved as a head-to-head-verified taste signal

Layer 3: the parser firewall

A small model will hallucinate broken syntax. So nothing it generates is trusted. Every output is parsed with acorn, validated against a zod schema, and walked for a deny-list of dangerous references before a single note plays. If it fails, the app retries up to 3 times with a hint that says exactly what was wrong ("previous attempt was invalid because: ..."). Invalid code never reaches the UI, and unsafe code (think fetch, eval, localStorage) never reaches the audio engine. 🔒

The parser firewall: raw output runs through JSON parse, syntax check, and a security walk before it is allowed to play

Priors tell the model the rules. Taste tells it your style. The firewall guarantees the output is real. None of those three layers is the model getting smarter. They are the runtime getting smarter, and that is the point.

Why the smallest model was the right call

The judging rubric asks for intentional model selection, so let me be blunt about it: I picked the smallest model in the family on purpose, and I would defend that choice in a heartbeat.

The app uses Gemma 4 E2B (effective 2B parameters, q4f16 ONNX). It is around 1.5 GB on disk, loads in under two minutes on a mid-range laptop, runs comfortably on WebGPU, and falls back to WASM when WebGPU is missing. After the first download it needs zero network. The whole loop (generate, like, re-generate) runs offline.

Could I have reached for something bigger? Sure. But once the three layers carry the structure, the model's actual job shrinks to something tiny: take a seed plus 3 stylistic exemplars and emit one short JSON object. That is well within a 2B model's reach. Spending 30 billion parameters on a task this constrained would be paying for generality I already built into the harness.

The backend chooser: run Gemma locally on WebGPU, or via OpenRouter

I did wire in an optional cloud path too, Gemma 4 31B via OpenRouter's free tier, for visitors without WebGPU or who want a faster bracket. Same prompts, same firewall, same axis directives. Judges can run it both ways and watch the small local model hold its own against its much larger sibling. That comparison, on identical scaffolding, is the most honest demo of the thesis I could ask for.

The bigger picture

I think we are still over-indexed on model size. The instinct, when a small model stumbles, is to reach for a bigger one. But a lot of "the model is not smart enough" is really "the runtime is not doing its share."

Google does a version of this trick at scale: pre-loading, pre-fetching, doing cheap predictive work before you ask so the expensive step feels instant. The same idea applies to small models. A retrieval step, a constrained output schema, a validation firewall, a handful of well-chosen few-shot examples: these are cheap pre-calls that make a 2B model behave like something far larger, and they run on a laptop with no data leaving the machine.

If you take one practical thing from this post, take this: before you upgrade the model, ask what structure you can move out of the weights and into the runtime. Pin the rules. Retrieve the context. Validate the output. A small local model wrapped like that is private, offline-capable, free to run, and genuinely good enough for a surprising amount of real work. Try it on your own niche domain or DSL and I think you will be surprised how far E2B gets you.

Gemma 4: the good, the bad, the ugly

I keep this section every time, because the honest notes are what I actually want to read in other people's posts.

The good

E2B running in a browser tab still feels a little like magic. WebGPU inference is genuinely usable on mid-range hardware, and with the static priors in place Gemma's JSON-following was reliable enough that the retry path rarely fires past attempt one. For a model this small, on a language it never trained on, that is a great result.

The bad

It never saw Strudel, full stop. Without the priors and the retry scaffolding it confidently invents operators that do not exist. The structure is doing real work here, and you feel it the moment you remove a layer. Local generation is also serial on a single WebGPU adapter, so a 4-contestant bracket has a real wait. I leaned into that instead of fighting it: a host toon named Buzz tells rotating jokes during the casting window and slips into a "patience mode" pool if it drags. A forced wait became part of the show.

Buzz the host filling the generation wait with jokes while contestants are cast

The ugly

The browser-ML reality: the first model download is large, WebGPU support is uneven across browsers, and there is a memory ceiling you can absolutely faceplant into if you are not careful. The fixes were unglamorous but they worked: cache aggressively after first load, fall back to WASM when WebGPU is unavailable, and keep the model's actual job small so memory pressure stays manageable. None of it is exotic, but it is the difference between a demo that works on your machine and one that works on a stranger's.

Demo

The whole thing is live and runs entirely client-side:

Live demo: https://the-g-factor.vercel.app/

Open it in any WebGPU-capable Chromium browser. The first load pulls the weights into the HTTP cache; after that you can go fully offline and the whole loop still works.

What's next

A few threads I want to pull: audio-reactive avatars so the toon-heads actually mouth along to their own track, swapping the bigram similarity for small on-device embeddings so the taste memory gets sharper, and leaning harder into the "cheap pre-call" idea, doing predictive generation in the background so the next contestant is ready before you ask for it. The pre-loading vision is where I think small local models get genuinely exciting.

I did not teach Gemma to write Strudel. I let the runtime teach it, and I let you teach it your taste. If you build something with a small Gemma, drop it in the comments, I would love to see how far you push E2B. 😁

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

The G Factor main stage: a velvet talent-show theatre with Gemma-generated contestants

In psychometrics there is a beautiful, slightly controversial idea called the g factor. The short version: across wildly different mental tasks (vocabulary, spatial puzzles, arithmetic, pattern matching) people who do well on one tend to do well on the others, and statisticians can squeeze that shared variance into a single number. One latent "general intelligence" that quietly predicts performance everywhere. 🧠

I built a browser app that makes a 2-billion-parameter model write music, and I named it The G Factor on purpose. Not as a cute pun (although it is one, three times over), but because the name is the whole argument I want to make: you do not need a giant model to look generally capable across a diverse range of tasks. You need a small model in the right harness.

The name is the