惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

I built an in-browser Roku TV remote with ~80 lines of TypeScript. Here's how Roku's ECP API actually works SOLID Heuristics Reveal Incomplete Domain Knowledge — Nothing More AllasCode Intitute / FullAgenticStack: The Intent-Based Router Introducing LogicGrid — Multi-Agent AI Orchestration for .NET AI Prompt Injection, Drupal SQLi Exploitation, and Nmap for Hardening AI Agents & Python Workflows: Anthropic Skills, Jupyter Challenges, and Edge Deployment SQLite Optimization, PostgreSQL Async Queries, & DuckLake Dataframe Spec RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix Microsoft Burned Its 2026 AI Budget on Claude Code in Six Months. That's the Real Story. Why I Started Learning FastAPI in 2026 I Abandoned Ghost for Months — Then Came Back and Finally Finished It Building an Open MIT-Licensed Ephemeris Engine in C — JPL Moshier Ephemeris 4 Smart Ways to Manage Retries in Side Projects Securing Web APIs: A Practical Guide to Authentication & Authorization Methods Google I/O 2026: AI Built an OS in 12 Hours. I Spent Mine Sorting Screenshots. 🤦 Half a Day, Not a Week: One Nix Flake for Three Machines 🌱 Keep Feeding Your CI/CD — Or Watch It Die Gemma 4 vs GPT-4o vs Llama 3: What Actually Works Locally? Vessel Ops SSH in 2026: Why Every Developer Should Know It Cold Audit AI-Generated PRs Before You Merge Them (Swarm Orchestrator 10.3.0) App Store Optimization (ASO) I built a tool to visualize Django REST Framework architecture (URLs, Serializers, Models, and more) How I made my React site agent-ready in 100 lines AI Can Generate Interfaces on the Fly. But Users Still Need Orientation. AI-Assisted Content Workflow How We Learned That Most Resume Rejections Happen Before Humans See Your CV How I Prepared for CKA: Resources, Labs, and Strategy That Worked for Me Remix Mini PC: Moving the Whole Operating System Onto the eMMC Stop Flying Blind: We Built an LLM Evaluation Framework That Works Across 17+ Agent Frameworks The Misleading "User is not authorized to access connection" Error in AWS CodeBuild — and Why Your IAM Policy Looks Fine I Resurrected a Dead F1 Project and Accidentally Built a Race Intelligence OS Remix Mini PC: After a Year of Dead Ends, the eMMC Finally Talks Not All Games Are Equal: The Real Difference Between a Trap and a Tool How to add Peppol e-invoicing to your SaaS without making it your team's problem I Built a Hermes Agent to Tell Me Which Hackathons to Enter. It Told Me to Enter This One. The Five Hooks That Change How You Ship With Claude Code Powering Your Progress: Building Robust Solutions with Laravel I built a self-hosted CI/CD platform with persistent queue, encrypted secrets, and rollback UI — here's what I learned Antigravity 2.0 and the $1,000 OS: Why "Agent-First" Feels Like the Direction I've Been Building Toward Anyway I built an AI PR-triage agent in 30 lines of Markdown Core Web Vitals from 74 to 91: A Real Tax Practitioner Site Rebuild I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened. Beyond the Loop: Why Monolithic AI Agents Fail and How to Build a Microkernel Architecture The Hidden Tax of AI-Assisted Development (And How I Fixed It) I Ditched Cloud LLMs for Gemma 4 4B: A DevOps Engineer's 48-Hour Reality Check Building a Schema.org @graph That Validates on the First Try The "Lift and Shift" Trap: Why Your Integration Layer Needs More Than Just a Cloud Address All 7 OSI Layers Explained with Real-World Analogies Antigravity 2.0 in one day: the four shells and what each is good for Self-Hosting Google Fonts with size-adjust: Zero CLS Web Font Swap The Multi-Provider LLM Problem: Why “One API” Is Not Enough How I indexed 69,000 Claude Code skills (and what I learned doing it) RememberMe CareGrid: Local Gemma 4 for dementia memory and safety Google Is Killing Gemini CLI on June 18. Here Is What to Do Before Then Do Domínio ao Deploy: Hospedando Arquivos de Deep Links no Cloudflare Pages (Parte 7.1) Running Gemma 4 26B on an Old GTX 1080 with llama.cpp Devlog 1: I tried building an SNES game with the super FX chip Why Gemma 4 Feels Like an Important Moment for AI Developers✨ From Zero and Confused, This Is How I Started Learning to Code I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founder’s Best Friend MyErp Architecture Series - #02 Cellular Architecture: Mapping Biology to Software Systems NodeJS vs Bun vs Go 🌍 RTL Arabic Style UI How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4 📝 Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders 터미널 AI 에이전트 구축 (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma Why I Built a Privacy-First Dev Toolkit GAS Input Tags: Ability Activation Without Hardcoded Bindings AI Legal Document Advisor Supported By Gemm 4 Model Building Convertify in Public Week 10: PDF Cluster + Blog Launch CureNet AI: Decentralized Health Intelligence for India, Powered by Gemma 4 and ABHA Standardization When Open-Weights AI Meets a Broken Healthcare System: Deploying Gemma 4 in Rural India
Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama
Samuel Komfi · 2026-05-25 · via DEV Community

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

When local AI conversations happen online, they tend to sound like this: "I ran the 70B model on my dual-GPU workstation." or "You only need 64GB RAM and a 24GB graphics card."

Meanwhile, I'm sitting with an Intel i5, 16GB RAM, integrated graphics, roughly 350GB of storage, and no monster GPU hiding under my desk.
That made me curious. If I wanted to build something with Gemma 4 locally, which stack actually makes sense on hardware that most developers realistically own?

So I looked at four names that keep coming up: Unsloth, LM Studio, llama.cpp, and Ollama.
At first they looked like competing products. After spending time with them, I realised they solve different parts of the same problem.

The first lesson: these tools aren't really competitors

My initial assumption was simple. Pick one, ignore the others.
But they fit together more like a pipeline:

  • Model fine-tuning → Unsloth
  • Inference engine → llama.cpp
  • Serving layer → Ollama
  • Desktop UI → LM Studio

Rather than replacing each other, they stack. In fact, LM Studio and Ollama both use llama.cpp under the hood. You don't necessarily need to install llama.cpp separately unless you want direct, low-level control over quantization or server flags.

Unsloth: fine-tuning without the anxiety

Fine-tuning usually sounds expensive. Huge GPUs, large memory requirements, long training runs. Unsloth tries to cut that cost significantly.
Would I train a large Gemma variant on my setup? Probably not. But smaller experiments and LoRA fine-tuning on the E2B or E4B models feel a lot less out of reach. The interesting thing about Unsloth isn't just the speed gains. It's that it makes the whole process feel less like something only research labs do.
That said, on a CPU-only machine, even small fine-tuning jobs are slow. For anything beyond a quick experiment, I'd probably train in a free Google Colab session with a T4 GPU, then export the resulting GGUF to run locally.

LM Studio: the least intimidating place to start

LM Studio removes almost all the friction. Download it, pick a model, run it, start testing. For a machine like mine, that matters.
The tradeoffs are real though. Larger models hit hardware limits quickly, and you have less control than you'd get with lower-level tools. But if someone asked me where to start if they've never run a local model before, LM Studio would be my first recommendation.

llama.cpp: the engine quietly powering everything

llama.cpp isn't flashy. No polished interface, no big buttons. But it shows up everywhere, and for good reason.
The smallest Gemma 4 model needs roughly 4GB of RAM at Q4 quantization, and the largest can push to around 20GB. On a 16GB machine, that headroom matters. Quantized models running through llama.cpp are often what makes local AI possible on hardware that would otherwise be too constrained. Without that kind of optimization, things get difficult fast.

Ollama: local AI that feels like infrastructure

Ollama was the tool that clicked immediately.

ollama run gemma4:e4b

Enter fullscreen mode Exit fullscreen mode

That simplicity changes your relationship with the whole thing. Instead of spending time managing files and configs, you spend time building. When you're working with FastAPI, Django, LangChain, or agent systems, Ollama starts feeling less like software and more like infrastructure you just trust to be there.

What I'd actually run on my machine

Gemma 4 comes in four sizes: E2B, E4B, the 26B MoE model, and the 31B dense model. Given my hardware, the 26B and 31B variants are effectively off the table unless I want to tolerate heavy disk offloading and painful slowdowns. The E2B and E4B models are specifically designed for edge and on-device deployment, which makes them the realistic options here. Quantized versions where possible.
My stack would look like this:

  • Experimentation: LM Studio
  • Application serving: Ollama
  • Optimized inference: llama.cpp (when I need direct control)
  • Fine-tuning experiments: Unsloth

The RAM reality check

Can you install all four on a 16GB machine? Yes. Can you run them all simultaneously while hosting a model? No.
Loading an LLM into RAM is exclusive. You can't have LM Studio and Ollama both holding a 6GB model in memory at the same time and still leave headroom for your OS and browser. The practical workflow is switching between them: experiment in LM Studio, shut it down, then serve via Ollama when you're building.

What I actually took away from this

The most useful discovery wasn't which tool is best. It was realising that local AI is becoming less about raw hardware and more about the tooling around it. I am building an EdgeTutor for kids in rural classroom in South Africa. It is an application that helps teachers be able to help kids with tailored knowledge of their needs. Models like Gemma 4 makes this possible as they run on small computing resources.

A few years ago, a machine like mine wouldn't really be part of the conversation. The smaller Gemma 4 models are specifically designed for efficient local execution on laptops and mobile devices, which means developers who aren't sitting on workstation hardware can genuinely participate now.
Maybe not with the biggest models. But enough to build. And sometimes that is all you need.