惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

DEV Community

Sixteen TUI components, copy-paste, no dependency The Boring Reliability Layer Every Autonomous Agent Needs Nven - Secret manager Building Multi-Tenant Row-Level Security in PostgreSQL: A Production Pattern Building Vylo — Looking for Collaborators, Partners & Early Support I Thought Memory Fades With Time. It Actually Fades With Information. ORA-00064 오류 원인과 해결 방법 완벽 가이드 I registered an AI agent at 1 AM and something cracked open in my head Pitch: Nven - Sync secrets. Ship faster. Why y=mx+b is the heart of AI From Routines to a Crew — Building a System That Plans Its Own Work & executes it 25 React Interview Questions 2026 (With Answers) — Hooks, React 19, Concurrent Mode An open source LLM eval tool with two independent quality signals Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data Skills, Java 17, And Theme Accents 4 Hard Lessons on Optimizing AI Coding Agents Arctype: Cross-Platform Database GUI for LLM Artifacts Your robots.txt says GPTBot is welcome. Your server says 403. Organizing How to Use AWS Glue Workflow 5 n8n Automations Every Digital Agency Should Be Running (Bill More, Work Less) Getting Started with TorchGeo — Remote Sensing with PyTorch Designing a Scalable Cross-Platform Appium Framework Google Antigravity 2.0 & Slash Commands Building a Unified Adaptive Learning Intelligence with Gemma 4, Flutter, and Multi-Model Orchestration Looking for beta testers for a £60 server management application The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally. Why AI Should Not Write SQL Against ERP Databases Vibe coding works until it doesn't. The debt is real. Shipping at the Edge: Migrating a Coffee Subscription Platform to Cloudflare Workers Stop Tab-Switching: A Developer's Guide to Color Tools That Actually Fit the Workflow DevOps vs MLOps vs AIOps: What Changes, What Stays, and a Simple Roadmap to Get Started Run Powerful AI Coding Locally on a Normal Laptop 5 n8n Automations Every WooCommerce Store Needs (Save 10+ Hours/Week) What I Learned Building My Own AI Harness Hytale Servers Will Fail Treasure Hunts Until We Fix Our Event Handling Redux in React: Managing Global State Like a Pro Unfreezing Your GitHub Actions: Troubleshooting Stuck Deployments and Protecting Your Git Repo Statistics Unlocking Project Discoverability on GHES: A Key to Software Engineering Productivity When the Cleanup Code Becomes the Project Rockpack 8.0 - A React Scaffolder Built for the Age of AI-Assisted Development Mismanaging the Treasure Hunt Engine in Hytale Servers Will Get You Killed Why Hardcoded Automations Fail AI Agents Stop Calling It an AI Assistant. It’s Already Managing Your Company Why I built a post-quantum signing API (and why JWT is on borrowed time) Weekend Thought: Frontend Build Tools Suffer From Work Amnesia A 10-Line Playwright Trick That Saved Me Hours on Every Sephora Run AI Is Changing Engineering Culture More Than We Realize Everyone Was Focused on Gemini, But Infinite Scaler Was the Real Twister "Gemma 4 Analyzed My Bank Statements – Apparently I 'Have a Problem' with Coffee and Late-Night Apps" #css #webdev #beginners #codenewbie The Hidden Layer Every AI Developer Must Learn AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent RDS Reserved Instance Pricing: Every Engine, Every Rule, Real Dollar Savings How To Build An AI-Powered MVP Without Burning Your Startup Budget In 2026 Reading a Psychrometric Chart Without Getting Lost LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) How to turn text into colors (without AI) Building Real-Time Apps in Node.js with Rivalis: WebSockets, Rooms, Actors, and a Binary Wire This Week In React #282 : Security, Fate, TanStack, Redux, Jotai | Hermes-node, Expo, Rozenite, Harness | TC39, Bun, pnpm, npm, Yarn, Node AI Copilot vs AI Agent Architecture - What's Actually Different (And Why It Matters) Smart Contract Security: NEAR's Futures Surge and AI Token Risks Database Maintenance: Tracing Production Incidents to Their Root Cause Stop juggling AI SDKs in PHP — meet Prisma Google Quietly Changed What “Apps” Mean at I/O 2026 The Infrastructure Team Is the Real Single Point of Failure Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team Your AI Has a Memory. It Just Doesn’t Know What to Remember. Claprec: Engineering Tradeoffs - Limited time vs. Perfection (6/6) Building a Daily Google News API Monitor in Python Building RookDuel Avikal: From Chess Steganography to Post-Quantum Archival Security Google I/O e IA: o que realmente muda na vida do dev? Color Contrast Failures: The Number One Accessibility Issue and How to Fix It # I Watched 15 Hours of Hermes Agent Videos So You Don't Have To Cómo solucionar el bucle infinito en useEffect con objetos y arrays en React The First Agent-Centric Cloud Security Platform — And Why We Didn't Build It That Way On Purpose Most Treasure Hunts Engines on Hytale Servers Are Built to Fail - Lessons from a Burned Database GhostScan v3.0 — From Closed-Source EXE to Open-Source Pentest Framework De hojas de cálculo a IA: construyendo una plataforma SRM moderna When is AI fine in education? Python Tools for Managing API Rate Limits in Data Pipelines How to Implement Exponential Backoff for Rate-Limited APIs in Python "My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline" next-advanced-sitemap v1.0.7 — safer URL ingestion & automatic trimming for Next.js sitemap generation I keep seeing people build an AI lead processing agent when they really need a 6-step rules engine AI Powered Student Learning Assistant Using Gemma 4 How I Built a Drop-In Proxy to Slash My OpenAI Bills by 20%+ Automatically Building a Sarcastic AI English Tutor with Persona-as-Code and Gemini Audio Input for Pronunciation Correction Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution AWS Savings Plan Buying Strategy: How to Layer, Size, and Time Commitments application.properties I built a macro tracker powered by AI + attitude Solace: A Global Mental Health First Responder Built with Gemma 4 Why Blocking Prompt Injection Is Wrong — and What to Do Instead
I gave Gemma 4 access to my Google Drive without sending a single file to the cloud
Dvorah · 2026-05-18 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Gemma Drive is a local AI assistant for Google Drive that runs entirely on your laptop. Files extracted on-device, summaries generated on-device, questions answered on-device. The only data that crosses the network is the Drive API call that downloads the file you've explicitly chosen to analyze.

The problem it solves is straightforward: every "AI for your files" tool I've tried requires me to upload my data to someone else's server. As a patent agent, that's a non-starter. My (patent work) Drive is full of client work, invention disclosures, and prosecution files that aren't going to a third-party LLM no matter how good the marketing copy is.

So I built one that doesn't.

What it does:

  • Connects to Google Drive via OAuth with the narrow drive.readonly scope. Tokens stored locally in SQLite.
  • Lets you browse your Drive in a custom file browser, search by name, and add individual files or entire folder trees recursively.
  • Extracts text from PDFs, DOCX, XLSX, PPTX, Google Docs/Sheets/Slides, plain text, and markdown.
  • Generates per-file summaries with Gemma 4 E4B, then rolls those up into per-folder summaries.
  • Answers free-form questions across the entire workspace using the summaries as context — "what are the main themes across these folders," "which folder is most relevant to patent prosecution," that kind of thing.
  • Surfaces every unsupported file (images, audio, archives) in a dedicated transparency panel with its full metadata, so nothing gets silently dropped.

The stack: Ollama running Gemma 4 E4B with CUDA acceleration on an NVIDIA GeForce RTX 4070 Laptop GPU. Django + DRF for the backend (OAuth, Drive API, extraction pipeline, prompt orchestration). React + TypeScript + shadcn/ui for the frontend. SQLite for storage. No vector database, no Celery queue, no Postgres — the simplest stack that does the job, because for a single-user local tool, anything else is overkill.

Demo

The clip shows the full flow: adding a folder from Drive (which walks all subfolders recursively), clicking "Process workspace" to extract and summarize, then asking Gemma a free-form question about the contents. Total time on my hardware: about 14 seconds for this small example folder (featuring non-patent, non-client work related to conferences); a real workspace with 50+ files takes 1-3 minutes to fully process the first time, after which everything is cached.

Code

GitHub: https://github.com/KISSPatent/gemma-drive

Setup is one terminal session plus one Google Cloud project. The short version:

  1. Install Ollama and pull the model
  2. Clone the repo
  3. Create a Google OAuth client with drive.readonly scope (instructions in the README)
  4. Run Django (python manage.py runserver) and Vite (npm run dev)
  5. Open localhost:5173, connect your Drive, start adding folders

How I Used Gemma 4

I picked Gemma 4 E4B specifically, and the why matters because it represents a different kind of model-selection thinking than I'm used to.

When I'm building alpha versions of my RocketSmart platform, my default is to reach for the most powerful model available — get the capability right first, optimize later. For a local-first tool, you flip that. The first question is what can run at all on the target hardware, the second is what runs at conversational speed, and only then do you ask whether it's smart enough.

Eighteen months ago, getting GPT-4-class "chat with my PDFs" quality meant renting a big cloud model or running a 30B–70B model on a workstation-class GPU. Today, a ~4B-parameter Gemma 4 E4B-class model can do local PDF QA at conversational speed on a consumer GPU using only single-digit GB of VRAM. My laptop has an NVIDIA GeForce RTX 4070 Laptop GPU — plenty capable, but a long way from the H100s and A100s people benchmark on. I needed a model that gave me real answers about real documents fast, not the smartest model that could grudgingly fit.

Why E4B was the sweet spot:

  • 4.5B effective parameters (8B with embeddings) in a Matformer architecture, quantized to ~6 GB at Q4_K_M. Fits comfortably in my GPU's VRAM with room for the OS and browser.
  • 128K context window — large enough that I didn't need a RAG layer for v1. Folder summaries fit in context for workspaces under a few hundred files.
  • Native multimodality in the weights (text + vision + audio + function calling). I'm only using text in v1, but the vision and audio paths are available one runtime upgrade away.
  • ~30 tokens/sec generation speed on my hardware — 11 seconds for a 300-token summary, 1-2 minutes to process a folder of 20 mixed-format files.

The summaries E4B produces are genuinely useful — not "good enough for a demo" but "good enough that I use the tool for actual work." This is one of those moments in AI where the "best model for the job" stops being a synonym for "the biggest model you can afford" and starts being a real engineering tradeoff between speed, cost, capability, and where the workload runs.

How E4B powers each part of the pipeline:

  1. Per-file summarization. Each picked file's extracted text gets sent to Gemma with a prompt asking for a 2-4 sentence summary focused on what the document is about, its purpose, and distinguishing names/dates/identifiers.

  2. Per-folder roll-up. The file summaries for each folder get fed back to Gemma as a single batch with a prompt asking for a paragraph describing the folder as a whole — common themes, key entities, what someone would use the folder for.

  3. Free-form Q&A. The user's question gets prepended to all folder summaries plus all file summaries in a single prompt. The system message constrains Gemma to use only the provided context and to say so directly when the answer isn't in the workspace.

The 128K context window matters here specifically — it means I can dump every summary in the workspace into a single prompt and let Gemma do the synthesis natively, without needing embeddings, vector search, or any retrieval layer. That keeps the architecture radically simple: extraction → summarization → ask. If folder counts grow past where context-stuffing works, I'll add sqlite-vec and embeddings, but I'm not paying that complexity cost until I have to.

Three things I learned about running Gemma 4 in production (in the loose sense of "production"):

Ollama's API doesn't expose audio yet

E4B has native audio understanding in the weights. I tested it on Day 1, expecting to ship audio support in v1. Two failures:

{"role": "user", "content": "Transcribe this audio.", "audios": ["<base64 audio>"]}
{"type": "input_audio", "input_audio": {"data": "<base64 audio>", "format": "mp3"}}

Enter fullscreen mode Exit fullscreen mode

That second error is interesting in a frustrating way — Ollama saw the non-text content block, routed it to the only multimodal pipeline it currently exposes (images), and tried to decode my MP3 as a PNG. The GitHub issue requesting audio support (ollama/ollama#11798) is still open. Audio works today through llama.cpp's server directly with the mmproj file — I just didn't want to add a second runtime for v1. The model has the capability; the runtime API doesn't ship it yet.

Ollama silently truncates at 2,048 tokens by default

After my first round of summarization, I noticed the database had empty strings even though the endpoint reported success for every file. The culprit: num_ctx defaults to 2,048 tokens in Ollama unless you explicitly set it. That's roughly 8,000 characters. My 11,000-character PDFs were getting silently truncated during inference, and the model returned empty content rather than partial answers.

The fix is one line:

"options": {"num_predict": 600, "num_ctx": 32768}

Enter fullscreen mode Exit fullscreen mode

The default makes sense for chat-style interactions; it bites the moment you summarize anything longer than a short email.

Unsupported files belong in a transparency panel

Most AI-on-your-files tools quietly skip what they can't handle. I built a dedicated panel that lists every file in the workspace whose MIME type isn't in the extraction pipeline — with name, folder path, MIME type, modification date, and owner. It changes the implicit contract: instead of "this tool understands your Drive," it's "this tool tells you exactly what it understands and what it doesn't."

For my Drive specifically, the panel surfaces a hundred-plus JPGs of conference photos and a few audio recordings, as this Drive is for non-patent, non-client work. Once Ollama exposes audio and I add image handling for Gemma's vision, those move into the supported list. Until then, they're not hiding.

What's next: audio and video, image-heavy PDFs (via Gemma 4's vision once I add rasterization), RAG with sqlite-vec if context-stuffing stops working, streaming chat responses, and a "watch this folder" mode that auto-extracts on Drive changes.

The whole project is built on a specific claim: edge-class AI is now good enough that you don't have to send your files to someone else's cloud to get useful work out of an LLM. A week of building, four real engineering gotchas, and one working tool later — I think the claim holds up.