惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

I built an AI PR-triage agent in 30 lines of Markdown Core Web Vitals from 74 to 91: A Real Tax Practitioner Site Rebuild Beyond the Loop: Why Monolithic AI Agents Fail and How to Build a Microkernel Architecture The Hidden Tax of AI-Assisted Development (And How I Fixed It) I Ditched Cloud LLMs for Gemma 4 4B: A DevOps Engineer's 48-Hour Reality Check Building a Schema.org @graph That Validates on the First Try The "Lift and Shift" Trap: Why Your Integration Layer Needs More Than Just a Cloud Address All 7 OSI Layers Explained with Real-World Analogies Antigravity 2.0 in one day: the four shells and what each is good for Self-Hosting Google Fonts with size-adjust: Zero CLS Web Font Swap The Multi-Provider LLM Problem: Why “One API” Is Not Enough How I indexed 69,000 Claude Code skills (and what I learned doing it) RememberMe CareGrid: Local Gemma 4 for dementia memory and safety Google Is Killing Gemini CLI on June 18. Here Is What to Do Before Then Do Domínio ao Deploy: Hospedando Arquivos de Deep Links no Cloudflare Pages (Parte 7.1) Running Gemma 4 26B on an Old GTX 1080 with llama.cpp Devlog 1: I tried building an SNES game with the super FX chip Why Gemma 4 Feels Like an Important Moment for AI Developers✨ From Zero and Confused, This Is How I Started Learning to Code I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founder’s Best Friend MyErp Architecture Series - #02 Cellular Architecture: Mapping Biology to Software Systems NodeJS vs Bun vs Go 🌍 RTL Arabic Style UI How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4 📝 Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders 터미널 AI 에이전트 구축 (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma Why I Built a Privacy-First Dev Toolkit GAS Input Tags: Ability Activation Without Hardcoded Bindings AI Legal Document Advisor Supported By Gemm 4 Model Building Convertify in Public Week 10: PDF Cluster + Blog Launch CureNet AI: Decentralized Health Intelligence for India, Powered by Gemma 4 and ABHA Standardization When Open-Weights AI Meets a Broken Healthcare System: Deploying Gemma 4 in Rural India V.A.L.I.D. Google I/O 2026: The Year Google Stopped Building AI Assistants and Started Shipping AI Engineers Bondmap: AI-Powered Relationship Network That Maps How You're Connected to Everyone Using Gemma 4 Gemma 4 challenge inspired me to build my first app! 96. LoRA: Fine-Tune a Billion-Parameter Model on a Laptop From a Student Who Used CircuitVerse to a GSoC Contributor — My Community Bonding Story How Bf-Tree Keeps Mini-Pages Small, Hot, and Cheap to Evict I asked Claude to explain the chip war and ended up understanding modern geopolitics differently Stop Manually Checking for Server Updates: Automate With Email Notifications Nostalgia Meets Cybersecurity: Spotting Modern Scams in a Retro OS Simulator - Forward or Fraud CRACKING CODING INTERVIEW From Python to Production Pipeline :A Practical guide to Apache Airflow Antigravity 2.0: Google Just Changed What It Means to Be an Engineer I Built a Free Sticker Maker Because Every Other One Hid the Export How I bypassed Blazor WebAssembly's Virtual DOM using raw WASM pointers Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable The Zero-Budget Memory Setup Behind My AI Agent Workflow No database. No framework. Just files, startup order, correction logs, and discipline. I Built an AI Second Brain with Gemma 4 The Most Exciting Google I/O 2026 Announcement for Me: HTML-in-Canvas CrisisLens: Compressing Disaster Scenes into 200-Byte Emergency Payloads with Gemma 4 I'm 15 and I built a todo app with Telegram Stars payments — only legal way for me to monetize before turning 18 Crypto Branding After the Token Launch Building an on-chain alerts bot in Python without any blockchain library FinePrint — An AI Pocket Lawyer That Decodes Predatory Contracts Using Gemma 4 How to Connect OpenAI with Supabase in 10 Minutes for a Lightning-Fast AI MVP One AI Gateway for AWS Bedrock, Google Vertex AI, Gemini, and Anthropic Reading Log #9 — Aoashi The Tacit Dimension Thinking, Fast and Slow Web3 Onboarding Is Not a Wallet Problem. It Is a Trust Problem. FHE Prompt Privacy: The Metadata Leak Your Demo Still Has Software Might Be Becoming Agent-Aware: What if software starts coordinating itself? The Silent Killers of Go Concurrency: Mutexes, Semaphores, and Goroutine Leaks Lynx framework first look Building Aries AI: A Solo-Built AI Abacus Tutor on OpenAI + Supabase + Render + Razorpay I built a paid Telegram bot. Here's what Telegram Stars actually pay. Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions Improving AI resume matching with prompt iteration — 7.37 to 8.37/10 7 things you can do with Rogue Studio that no other AI IDE will let you do Why I Think WordPress Still Matters Reading Log #7 — Aoashi Guns, Germs, and Steel Distinction Open Models and the Sub-Saharan Region What 12 Months of AI-Generated Pull Requests Taught My Engineering Team
I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened.
Jonathan Mel · 2026-05-25 · via DEV Community

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

A Write track submission for the Gemma 4 Challenge — an honest look at local AI tool-use on consumer hardware, and the architecture that made it work.


Local AI models are having a moment. You can pull Gemma 4 in a single command, run it on hardware you already own, and have a private, capable LLM running in minutes. That part is solved.

The part nobody talks about? Giving it tools.

Not "tools" in the abstract sense. I mean: let Gemma 4 search the web, read your files, query a database, send a Slack message, spin up a Cloudflare Worker — the things that make an AI agent actually useful in production. That part is not solved. Not out of the box. Not on Windows. Not without hitting walls that will cost you days.

This is the story of how I got Gemma 4 running with 150+ MCP tools on a Windows machine, what broke, what I learned, and the architecture that finally held.


Why Gemma 4

I want to be upfront about intentionality here — the judging rubric asks for it, so let's talk about it directly.

I chose Gemma 4 for one reason: it runs on hardware my clients actually own.

I work with small engineering teams, often in compliance-sensitive environments where "just use the API" isn't an option. Data can't leave the building. Models need to be auditable. And the budget for a dedicated GPU cluster doesn't exist.

Gemma 4 hits a specific sweet spot, and I went with the E2B variant specifically:

  • Runs on a workstation-class CPU with no dedicated GPU required (I'm on a Dell T3610 — nothing exotic)
  • Capable enough to handle tool-use routing with reasonable accuracy at that size
  • Natively multimodal out of the box — vision came for free
  • Local-first, which matters in healthcare and legal contexts where data can't leave the building
  • Apache 2.0 licensed — no usage restrictions for commercial client work
  • Maintained by Google DeepMind with a clear roadmap

I'm not running it because it's the most powerful model. I'm running it because it's the most deployable model for the environments I actually work in. That distinction matters.


The Problem With Local AI + Tools

Most tutorials stop at "run the model." The interesting engineering starts when you ask: how does this model call an external service?

The answer involves MCP — the Model Context Protocol, an open standard from Anthropic that defines how AI models communicate with tools. It's the right answer. But on Windows, it is a brutal debugging experience.

Here's what I ran into:

Problem 1: DNS rebinding protection kills local servers

MCP Python servers using FastMCP have DNS rebinding protection enabled by default. When you're running a local gateway, this silently blocks connections. The error message tells you nothing useful. The fix is a single line — TransportSecuritySettings(enable_dns_rebinding_protection=False) — but you won't find it in the official docs.

Problem 2: UTF-16 LE encoding corrupts Python files

If you edit any MCP server config in Notepad on Windows and save it, you may have silently corrupted the file. Notepad defaults to UTF-16 LE. Python expects UTF-8. The file looks fine in any editor. It just doesn't run. At all.

Problem 3: Claude Desktop has an 8-server limit

Exceed 8 registered MCP servers and Claude Desktop silently drops servers on a 60-second timeout. No warning. No error. Servers just disappear. This one cost me an afternoon.

Problem 4: docker-compose vs docker compose

On older Docker installs (which enterprise machines often have), docker compose (v2 syntax) doesn't exist. docker-compose (v1) does. Scripts that work on your dev machine fail silently on the client's server. Every time.

Problem 5: stdout/stderr deadlocks with subprocess

If you're building an MCP server that shells out to other processes, capture_output=True in Python's subprocess will deadlock when the child process writes enough output to fill the pipe buffer. This is a known Python issue. The fix is writing to temp files. The symptoms look like your server just... hangs.

I documented all of these in an earlier dev.to post. The point here is: tool-use on Windows is not plug-and-play, and Gemma 4 is no exception.


The Architecture

Once I understood the failure modes, I built a gateway layer that sits between Gemma 4 and every tool it needs to call.

Here's what it looks like:

┌─────────────────────────────────────────────────────┐
│                   Gemma 4 (Ollama)                  │
│              Running locally on T3610               │
└─────────────────────┬───────────────────────────────┘
                      │  OpenAI-compatible API
                      ▼
┌─────────────────────────────────────────────────────┐
│              MCP Gateway (Docker)                   │
│         Unified tool routing layer                  │
│    Port 8089 → 8009 (internal)                      │
└──────┬────────┬────────┬────────┬───────────────────┘
       │        │        │        │
       ▼        ▼        ▼        ▼
  Web Search  Files   Notion  Cloudflare
  Databases   Git     Slack   Calendar
  ...150+ tools total

Enter fullscreen mode Exit fullscreen mode

The key design decision: Gemma 4 never talks to tools directly. It talks to the gateway. The gateway handles all the MCP plumbing, server registration, error recovery, and routing. Gemma just sees a clean tool list and calls them by name.

This matters because:

  1. Model portability — I can swap Gemma 4 for any other OpenAI-compatible model without touching the tool layer
  2. Failure isolation — when a tool server crashes (and they do), it doesn't take down the whole session
  3. Auditability — every tool call goes through a single chokepoint I can log, inspect, and gate

What Gemma 4 Can Actually Do In This Setup

Once the architecture was solid, I tested Gemma 4 on tool-use tasks that matter in real workflows:

File operations: Reading, writing, and summarizing local files. Solid. This is where smaller models shine — the task is clear, the context is bounded.

Multi-step research: "Search for recent papers on X, save a summary to Notion, and create a calendar reminder to review it." This worked about 70% of the time. The failure mode is mid-chain context loss — Gemma 4 sometimes forgets it's mid-task after a tool result comes back with a lot of tokens.

Code operations: Reading a repo, identifying a bug, writing a fix. Reasonable for small files. Falls apart on large codebases where context window becomes a real constraint.

Calendar and communication tools: Gemma 4 handles these well. The tasks are short, the schemas are simple, and the model doesn't need to maintain long chains.

The honest limitation: Gemma 4 is not GPT-4 at multi-step agentic tasks. If your workflow requires 8+ chained tool calls with complex state management, you'll see degradation. For single and double-hop tool use, it handles itself well.


Why This Architecture Generalizes

I want to make a point that goes beyond this specific setup.

The model will keep getting better. Gemma 5, 6 — whatever comes next will handle multi-step tool-use with more reliability. But the infrastructure problem — routing, reliability, security, Windows compatibility — that doesn't get solved by a better model.

The gateway layer I built is model-agnostic. The same Docker Compose stack that runs Gemma 4 today can route Llama 4, Mistral, or any future open-source model tomorrow. The tools don't change. The clients don't change. Only the model does.

That's the bet I'm making: the infrastructure for local AI tool-use is the durable asset. The models are the commodity.


Getting Started

If you want to replicate this setup, here's the minimum viable path:

Prerequisites:

  • Docker (any recent version)
  • Ollama installed locally
  • A Windows machine with at least 16GB RAM (32GB recommended for Gemma 4)

Step 1: Pull Gemma 4 via Ollama

ollama pull gemma4:e2b

Enter fullscreen mode Exit fullscreen mode

Step 2: Verify it runs

ollama run gemma4:e2b "List 3 things you can help me with"

Enter fullscreen mode Exit fullscreen mode

Step 3: Point your gateway at Ollama's OpenAI-compatible endpoint

http://localhost:11434/v1

Enter fullscreen mode Exit fullscreen mode

Ollama exposes an OpenAI-compatible API by default. Your gateway, any MCP-compatible client, or any OpenAI SDK can talk to it without modification.

Step 4: Register your tools

This is where the gateway earns its keep. Instead of configuring each MCP server individually for Gemma, you register them once at the gateway level. The model gets a unified tool list. You get one place to manage everything.


What I'd Tell Someone Starting This Today

  1. Expect Windows-specific failures. They're not your fault. They're documented failure modes that affect everyone. Know them going in.
  2. Use a gateway layer. Don't wire tools directly to the model. You'll regret it the first time a tool server crashes mid-session.
  3. Start with single-hop tool calls. Verify the plumbing works before building multi-step agents. One successful file read tells you more than a complex workflow that fails mysteriously.
  4. Pick the right model for the task. Gemma 4 is excellent for bounded, local, compliance-sensitive workflows. It is not a replacement for frontier models on complex agentic tasks. Know what you're optimizing for.
  5. The infrastructure is the product. The model is a component.

Closing

Local AI + tools on Windows is solvable. It just requires understanding the failure modes, designing around them, and building an infrastructure layer that the model can rely on.

Gemma 4 made this story possible because it runs on hardware that's actually available in the environments that need this most. That's not a small thing.

If you're building something similar, I'm happy to talk through the architecture in the comments. The failure modes I listed above have documented fixes — no reason to rediscover them the hard way.