惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Proofpoint News Feed
L
Lohrmann on Cybersecurity
S
Secure Thoughts
Attack and Defense Labs
Attack and Defense Labs
人人都是产品经理
人人都是产品经理
Stack Overflow Blog
Stack Overflow Blog
W
WeLiveSecurity
O
OpenAI News
SecWiki News
SecWiki News
博客园 - Franky
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
Microsoft Security Blog
Microsoft Security Blog
aimingoo的专栏
aimingoo的专栏
Security Latest
Security Latest
H
Hacker News: Front Page
Google Online Security Blog
Google Online Security Blog
P
Privacy & Cybersecurity Law Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Darknet – Hacking Tools, Hacker News & Cyber Security
月光博客
月光博客
李成银的技术随笔
Spread Privacy
Spread Privacy
F
Full Disclosure
F
Fortinet All Blogs
T
The Exploit Database - CXSecurity.com
Vercel News
Vercel News
AWS News Blog
AWS News Blog
WordPress大学
WordPress大学
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
V
Visual Studio Blog
J
Java Code Geeks
博客园 - 三生石上(FineUI控件)
G
Google Developers Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Engineering at Meta
Engineering at Meta
Last Week in AI
Last Week in AI
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
T
True Tiger Recordings
N
News and Events Feed by Topic
酷 壳 – CoolShell
酷 壳 – CoolShell
Cisco Talos Blog
Cisco Talos Blog
N
News | PayPal Newsroom
S
SegmentFault 最新的问题
Jina AI
Jina AI

DEV Community

When the API literally burned your database after a typo COOKIES DPRK Hacking Trends 2026: AI‑Powered Supply Chain and Developer Environment Attacks Phone control for AI coding sessions is not a tiny terminal PayPal and Crypto Are Not Equals: How I Built a Gumroad Alternative for Restricted Countries Exploring Tech as a Content Writer I Raised Gemma 4's Token Cap. The Dense Model Stopped Refusing. React Server Components Don't Make Your App Fast by Default Multi-Stage Builds for a Next.js App — Reduce Image Size by 70% I Built a Chrome Extension That Teaches Vocabulary While You Browse Why I Walked Back from Next.js and RSC to a Plain SPA and a Separate Backend Github Speckit: Revolucionando o Desenvolvimento com SDD Cloud Cost Elasticity I Built a Payment System for Bangladesh—Heres Why Stripe Failed Us Polyglot Persistence in Microservices: Choosing the Right Database for Each Service Centralized Authentication for a Multi-Brand Laravel Ecosystem How I made a perfect recording button. Simple yet complex thing. Mumbli – my personal Wispr Flow Getting Paid Should Not Be a Geopolitical Nightmare: My NOWPayments Integration Story Four Layers of Validation in Kubernetes with Claude Code Prompt Flow — a visual side project for flow design, trace, and integration steps (looking for feedback) AI Citation Registry: Temporal Gaps in Government Publishing Cycles ShowDev: I built a 100% local, zero-upload PDF editor using WebAssembly JavaC Written by an AI Pipeline, Verified by Three Models. Is It Slop? Part1 Vulkan: Drawing Triangle 1 Why I Stopped Using useEffect to Sync State — and What I Use Instead Por qué dejé de usar useEffect para sincronizar estado y qué uso ahora Migrating a Long-Running WordPress Site to Payload CMS (And All The Chaos That Came With It) Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans Azure DevOps Structure Explained: Organizations, Projects, and Repos Without the Mess A Simple React Hook for localStorage State, Expiry, and Sync I sold you on /scratchpad. Then I migrated to /note. Fixing WSL Errors on Windows 11 Your app is not Netflix. Stop building like it is. Resolving inter-service communication issue I built an email cleaner. CSV parsing took longer than the actual validators. How I Would Learn Full-Stack Development in 2026 If I Started From Zero Partition Evolution: Change Your Partitioning Without Rewriting Data What Google Play's I/O 2026 Updates Look Like From a Solo Indie Puzzle Developer Forgetting the Myth of "Ease of Integration" When Selling Digital Products with Bitcoin My 4-Step Regex Debugging Workflow (That Actually Saves Time) Stop Scraping Betting Sites: How to Build a Real-Time Sports Tracker in Python Civic Identity and Responsibility in Modern Democracy OLTP vs OLAP Are binaries really executable code ? The lie of the 80%: why software progress charts don't work What a Datacenter in Space Actually Buys You: Three Server Racks Is AI Actually Citing Your Site? How to Measure What Google Rankings Can't Accessibility - This looks like a job for a developer advocate! I built a Mac app that turns web pages into live widgets How to Teach Source Evaluation When Your Students Use ChatGPT More Context Does Not Mean More Trust RAG Series (24): Code RAG — Teaching AI to Understand Your Codebase Past the JVM Design decisions behind my “Irregular German Verbs” iOS app WordPress 7.0 "Armstrong" Is Live — Post-Release Deep Dive 🎺 Performance and Apache Iceberg's Metadata I Shipped a Bug to Production That Cost Us 3 Hours of Downtime 程序人生:在代码与时间之间 The Wrong Way to Think About XRPL Event Infrastructure What I Learned About MND, Voice Banking, and Why Assistive Tech Is Personal $1.50/Month Email Infrastructure That Beats Your $20 SendGrid Plan Cloud Unit Economics: The Metrics DevOps and FinOps Teams Actually Need Bypassing Payment Platform Restrictions Was The Best Decision I Ever Made For My Digital Product Business The Hidden Life of a Container: A Complete Lifecycle When a port is already in use, there is no interactive way to find it — so I built `port-peek` Como Sumir com o Barulho do Teclado Mecânico no Ubuntu Usando o NoiseTorch Google I/O 2026 dropped a bomb on Android tooling, and nobody's talking about it (or maybe they are 😅) Mentoring Junior Developers: What Actually Works How I Prevented Claude Code from Breaking My Architecture with 18 Tests That Run in 0.4 Seconds I Controlled an ESP32 Drone Using Only My Voice vite HMR is silently the reason ur laptop fan wont stop AI Agents Security for Developers: Don't Let Your Agents Become a Liability Single List Keyboard Handling 9 SaaS development companies worth knowing (a technical look) Material Nova — The Best VS Code Theme of 2026 Inference Routing Is Becoming an Infrastructure Placement Problem I just build a League MBTI Analytics Why I Built My Own Site with Astro, Not WordPress when I use WordPress for a Living Hello! I'm a balloon artist who started 3D modeling 7 Next.js 16 Caching Bugs That Compile Fine and Break Silently in Production I got tired of writing READMEs so I built a tool that generates them from your GitHub URL FrontGate: a Lightweight Package Proxy for Supply Chain Security Why Your Expense Tracking Architecture Keeps Breaking Stop your AI trading agent from hallucinating technical analysis Breaking the Monorepo Barrier in a Crypto Store for Digital Products Imposter Syndrome Is Something We All Struggle With at Some Point in Our Careers Moving Beyond the Black Box: How I Built a Real-Time Voice Fitness Coach using Next.js 15, Convex, & Vapi.ai How to Recover Kafka DLQ Messages After a Schema Change Broke Your Consumer From Spec-Driven Development to Attractor-Guided Engineering Githubster free tool to track your GitHub followers and unfollowers Why Bitcoin Core RPC is Too Slow for High-Frequency Trading (And How to Fix It) Why Reading Food Labels Shouldn't Feel Like Decoding a Chemistry Exam I built a "brain" for AI coding agents — it never forgets and never stops How to Build a Local LLM Agent to Automate Work List Generation from Monthly Reports (With Jira Integration) Controlling Employee AI Usage on Managed Devices: Browser Controls, Cloudflare AI Gateway, and AWS Bedrock When Global Payment Gateways Fail, Local Solutions Shine LeetCode Solution: 13. Roman to Integer End-to-End Observability for vLLM and TGI: from DCGM to Tokens
NeuralPocket: Private On-Device AI with Gemma 4 — Android & Web
Prema Ananda · 2026-05-21 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4


What I Built

NeuralPocket — a private multimodal AI assistant that runs entirely on your device. Available as both an Android app and a web app. No cloud, no subscription, no data leaving your hands.

Honest About My Motivation

I've participated in Google hackathons several times. Each time I built something real, put in the work — and each time walked away with just a participation badge 😄 This time I want to actually place, though I know there are plenty of strong projects out there!

So NeuralPocket is not a demo and not a proof-of-concept. It's a full-featured app with real architecture that solves a real problem.

The problem: modern AI assistants are brilliant — until you lose Wi-Fi. On a plane, in the mountains, roaming abroad, they become useless icons. And every message you type, every photo you send, flies off to someone else's servers.

Google gave me an extra push: the AI Edge Gallery app simply refused to install on my Android 9. Even though the phone has a 64-bit OS — which matters, since LiteRT-LM only runs on 64-bit. Instead of giving up, I figured it out myself. That became the starting point for NeuralPocket.

I wanted an assistant that:

  • works fully offline — always, everywhere
  • never sends your data anywhere
  • understands text, photos, and audio — in one chat
  • runs on both Android and in the browser

What NeuralPocket Can Do

  • 📷 Photo analysis — snap a menu in Japan → translation and context; photograph a broken part → repair advice; photograph a document → ask questions about it
  • 🎤 Voice input — record up to 30 seconds, converted to WAV, processed on-device
  • 💬 Multiple independent chats with different system prompts — "Translator", "Tech Assistant", "Personal Journal"
  • ⚙️ Configurable context memory — 0–5 conversation pairs to balance coherence and context window
  • 🎨 Markdown rendering — model responses display with full formatting: code, lists, emphasis

Demo

🎬 Android Demo Video (will come later...)

🌐 Web Version (live)


Code

Both projects are fully open source:


How I Used Gemma 4

Choosing the Model

I chose Gemma 4 E2B IT (2B parameters, ~2.6 GB) as the primary model for three reasons:

  • Native multimodal input — text, image, and audio in a single request, no workarounds needed
  • Compact size — fits on a mid-range Android phone with 4+ GB RAM
  • One model, two platforms.litertlm for Android LiteRT-LM, .web.task for WebGPU in the browser

For devices with 6+ GB RAM, the app offers Gemma 4 E4B (~3.7 GB) as a more capable option. The 31B Dense model is overkill for on-device use cases for now.


Architecture: Two Platforms, One Model

┌─────────────────────────────────────────────────┐
│                  NeuralPocket                   │
├──────────────────────┬──────────────────────────┤
│     Android App      │        Web App           │
│       Kotlin         │  React 19 + TypeScript   │
├──────────────────────┼──────────────────────────┤
│   LiteRT-LM SDK      │  MediaPipe Tasks GenAI   │
│   (native runtime)   │  Web Worker + WebGPU     │
├──────────────────────┴──────────────────────────┤
│              Gemma 4 E2B IT / E4B IT            │
│            (running locally on device)          │
└─────────────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode


Android: LiteRT-LM

Stack: Kotlin + Google AI Edge LiteRT-LM + CameraX + MVVM

The engine automatically selects the best available backend — GPU via Vulkan or OpenCL, falling back to CPU via XNNPack. Concurrent inference calls are serialized through a Mutex to prevent race conditions.

Key architectural decisions:

  • A single StateFlow<ChatUiState> as the source of truth — the UI only observes, never mutates directly
  • Chat history is written atomically via a temp file — no data loss on crash
  • The vision encoder loads only when an image is present — saves RAM
  • Preflight check on first launch: RAM, ABI, free storage — the app warns if the device doesn't meet the minimum requirements

Performance:

  • GPU (Vulkan/OpenCL): ~15–30 tokens/sec
  • CPU-only (XNNPack): ~5–10 tokens/sec
  • Requirements: Android 8+, arm64, 4+ GB RAM

All three screenshots were taken in airplane mode — no network, everything running locally:

Text chat — fitness trainer persona on Gemma 4 E2B
Photo input — model analyzes an image
Voice input — transcription and response


Web: WebGPU Right in the Browser

Stack: React 19 + TypeScript + Vite + Tailwind CSS v4 + MediaPipe Tasks GenAI

All inference runs inside a Web Worker — generation never blocks the UI, keeping the interface responsive during streaming. Models are cached in OPFS (Origin Private File System): first launch downloads ~2.6 GB, every subsequent launch starts instantly without a network connection.

Three model presets are supported: Gemma 4 E2B, Gemma 4 E4B, and Gemma 3 Multimodal. You can also provide a custom model URL.

The web app is built as a PWA (Progressive Web App) — you can install it on your computer as a standalone app with one click from the browser, just like YouTube or other web services. Once installed, it appears in your app menu and opens in its own window without an address bar.

Installing NeuralPocket as a PWA

Web version in action (all computation happens locally in the browser via WebGPU):

Text chat in the web version

Image analysis in the web version

Audio processing in the web version

Honest caveat about offline: after the first launch the app works without a network. But it's not fully autonomous out of the box: the MediaPipe runtime loads from jsDelivr, and fonts load from Google Fonts. For full offline you'd need to self-host those dependencies.

Honest caveat about multimodal in the web: at the time of development I couldn't find web-optimized multimodal models for Gemma 4 — available versions only support text. However, I found a fully multimodal model from the previous generation — gemma-3n-E2B-it-int4-Web.litertlm — which supports displaying text, images and audio directly in the browser. That became the third preset in the web version.

A note on how fast things move. While building NeuralPocket, Google released Gemini 3.5 Flash — and first impressions suggest it's a notable step up from 3.1. It handles complex multi-step tasks confidently: for example, it wrote a full test suite for the web version of NeuralPocket on the first try, something that used to take several iterations. It's remarkable how fast this space evolves — the world changes while you're still writing the article.

At this pace, in a year you might just need to download the latest Gemma and ask it to build the whole app itself. Probably. Maybe. 😄


Privacy as Architecture, Not Marketing

NeuralPocket sends nothing anywhere — not messages, not photos, not chat history, not analytics. This isn't a setting you toggle. It's a consequence of the architecture: there's no server that could receive anything. Works in airplane mode. No account, no subscription.


Summary: Android vs Web

Two apps, one idea — but different trade-offs:

🤖 Android 🌐 Web
Installation APK (~36 MB) None — just open in browser
Install as app ✅ native ✅ PWA
Model Gemma 4 E2B / E4B Gemma 4 E2B / E4B
Text chat
Photo input ⚠️ Gemma 3n only
Audio input ⚠️ Gemma 3n only
Offline ✅ after downloading ⚠️ after first launch
models and downloading models
Performance ~15–30 tok/s (GPU) depends on browser WebGPU
Requirements Android 8+, arm64 Chrome / Edge with WebGPU
Multiple chats
Custom model ✅ by URL

Need maximum multimodality and full offline? Go Android. Want to try it right now without installing anything? Go Web.

🤖 Download APK

🌐 Web Version (live)


📦 Models: gemma-4-E2B-it-litert-lm · gemma-4-E4B-it-litert-lm on HuggingFace

Built with ❤️ on Gemma 4 + Google AI Edge LiteRT-LM