惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

MyScale Blog
MyScale Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Google DeepMind News
Google DeepMind News
C
Cisco Blogs
量子位
WordPress大学
WordPress大学
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
C
Comments on: Blog
Blog — PlanetScale
Blog — PlanetScale
PCI Perspectives
PCI Perspectives
Martin Fowler
Martin Fowler
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
D
DataBreaches.Net
T
The Exploit Database - CXSecurity.com
有赞技术团队
有赞技术团队
Hugging Face - Blog
Hugging Face - Blog
Simon Willison's Weblog
Simon Willison's Weblog
Stack Overflow Blog
Stack Overflow Blog
月光博客
月光博客
T
Troy Hunt's Blog
L
Lohrmann on Cybersecurity
L
LangChain Blog
Security Latest
Security Latest
A
Arctic Wolf
博客园 - Franky
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
C
Check Point Blog
V
Vulnerabilities – Threatpost
博客园 - 聂微东
SecWiki News
SecWiki News
H
Hackread – Cybersecurity News, Data Breaches, AI and More
I
Intezer
腾讯CDC
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
N
News and Events Feed by Topic
E
Exploit-DB.com RSS Feed
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Engineering at Meta
Engineering at Meta
Microsoft Security Blog
Microsoft Security Blog
Google DeepMind News
Google DeepMind News
Spread Privacy
Spread Privacy
Recorded Future
Recorded Future
C
CERT Recently Published Vulnerability Notes
Last Week in AI
Last Week in AI
大猫的无限游戏
大猫的无限游戏
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
小众软件
小众软件

DEV Community

🐳 How to Run Any Project in Docker: A Complete Guide Glyph v0.2: the release is the joinery FCoP 3.0: Why AI Agents Need a Track, Not a Brake The Subconscious Powered by Edge AI GPU Utilization Is Becoming the New Cloud Waste Crisis Cómo solucionar `docker run` con exit code 1 en Raspberry Pi JWT is a scam and your app doesn't need it 7 Agent Skill Packs That Actually Make AI Coders Better More Control, More Cost: Why Commanding AI Isn't Delegation SecureScan Synthadoc: We Built an AI Judge for Our AI Wiki Compiler - Here's What We Learned Cómo solucionar el error de permiso al ejecutar `pip.exe` en entorno virtual (Python 3.10 en Windows) Postgres-grade Serializable at 20k+ ops/s — on a laptop. Don’t try this at home. Pure Core, Imperative Shell in Rust with Stillwater Lean 4 for Programmers: Building a Todo List with Proof Trustless Bug Bounty Releases with a PoW-Gated DLC Oracle Building Autonomous DevOps Agents with MCP and LangChain Multimodal Gemma 4 Visual Regression & Patch Agent Git Time Machine — How Version Control Can Save Your Project My Dad Got an Electricity Bill He Couldn't Understand. Google I/O 2026 Just Made That Problem Solvable. My Dad Got an Electricity Bill He Couldn't Understand. Google I/O 2026 Just Made That Problem Solvable. Read Replicas Lie About Consistency. 4 Sync Modes Behind the Lie. Reviving My Coding Project with GitHub Copilot I Tried Gemini 3.5 Flash After Google I/O 2026 - Here is What I Found :)) Zero-Cost AI in VS Code Blueprints Might Be More Important Than Frameworks AI CareCompanion - Offline Health Assistant Long-Context Models Killed RAG. Except for the 6 Cases Where They Made It Worse. I Built a Neural Network Engine in C# That Runs in Your Browser - No ONNX Runtime, No JavaScript Bridge, No Native Binaries An In-Depth Overview of the Apache Iceberg 1.11.0 Release Your Agent Just Called the Same Tool 47 Times. Here's the 20-Line Detector. How I Built a Multi-System Astrology Bot in Python (And What Meta Banned Me For) Gemma 4 Has Four Variants. Here's How to Pick the Right One Before You Write a Single Line of Code. Log Level Strategies: Balancing Observability and Cost Why WebMCP Is the Most Important Thing Google Announced at I/O 2026 (And Nobody's Talking About It) Making LLM Calls Reliable: Retry, Semaphore, Cache, and Batch Google's 2x Energy Efficiency Claim Is Real — But Here's What They're Not Measuring What's actually going on with CORS, under the hood Language-Agnostic Code Generation: The Driver Plugin Model Why We Rewrote Our Python CLI in Go (and What We Gained) I added up everything Google gives developers for free after I/O 2026. It's kind of absurd The Dawn of Smarter Apps: My Take on Google I/O 2026 AI Announcements Why AI Agents Like Hermes Need a Semantic Execution Layer for the Physical World Why We Built TestSmith: The Test Coverage Problem Nobody Talks About How to Convert Bank Statement PDFs to Excel: The Complete 2026 Guide Have You Ever Used a Website That Keeps Working After You Turn Off Your Internet? From idea to indexed: how I launched a SaaS in 60 days with Laravel + React Building a local-first AI tutor for my daughter (and 10–14 year-olds in Austrian schools) with Gemma 4 EC2 SSH Not Connecting? Here Are the 5 Things That Were Wrong (And How I Fixed Them) Best AI Tools for HVAC Contractors 2026 From Closed Internal Stack to Open-Source Ecosystem: I Finally Shipped Three Years of .NET Infrastructure Scrumpan is offlically LIVE!! Building a BMI Calculator CLI with TypeScript — Types, Functions, and Vitest From Building WordPress Websites to Node.js APIs: My Honest Full Stack Journey XiHan Snore Coach: Privacy-First On-Device MedTech Guardian powered by Gemma 4 Mobile Why AI Coding Agents Hallucinate and How to Fix It mcp-probe v1.4.0: Contract assertions for production MCP servers Google I/O 2026 Wasn't About One More Model. It Was About the Agent Stack. How I built 100+ crypto calculators in 6 languages on Astro The Dawn of Local Multi-Agent Architectures: Why Gemma 4 Changes Everything for Cloud Developers # I Told My AI to Simulate a Planet for 10,000 Years. It Built the Whole Thing Itself. 18/30 Days System Design Questions! From Hackathon Chaos to Clean CLI: Reviving My Daily Routine Analyser with GitHub Copilot Building a Home Lab with Proxmox and Terraform (for Kubernetes) PolicyAware vs Guardrails vs AI Gateways vs Model Routers: The Comparison Every AI Engineer Needs to Read Partner: An AI That Does Research While You Sleep Rugby Fundamentals as Software Concepts - Mapping the Pitch to your Code Base I Let Claude Code Run Unsupervised for 24 Hours. Here's What Happened. Why Zed Is Replacing VS Code in My AI-Augmented Workflow Build a scroll-driven WebGL hero in 30 lines Karpathy's LLM Wiki? No Code with Claude or Github Copilot! Why Platform Governance and Transparency Matter for Developers and Freelancers I built a Flutter CLI that generates Clean Architecture in seconds Using an LLM to automate a task that used to take hours by hand CyberArena – Interactive Cyber Security Simulation & Threat Analysis Platform Tile Extractor Mathematical Functions in CSS: clamp, min, max and How They Simplify Responsiveness Polyglot Persistence in Microservices: Let the Domain Choose the Database 190 Countries, Zero API Calls: Shipping Static Data in a Chrome Extension Your AI Writes Code Fast. Here’s How to Check It Before Shipping qwen2.5-coder is too slow for Claude Code on a Mac. Here's the fix. Building Automated Text-to-Video Pipelines with AI Can Gemini Become an Offline AI Tutor? Lessons from Building Educational AI OPRIX : From a simple messaging web app to a well structured and enhanced UI messaging web app Why React + TypeScript Nullability Slowly Becomes Exhausting Why AI Agents Need a Project Layer - Part 1 Stop Hand-Editing MCP Configs: A Zero-Dependency Go CLI What I Learned Working With Microsoft, SQUAD(GTCO), and Different Tech Communities 🧠 Hermes Agent Assistant — A Modular AI Agent System with Planner, Executor & Memory Spring Boot Auto-Configuration Source Code: Nail This Interview Question The Ultimate Guide to Free AI API Keys: 6 Platforms You Need to Know Why 91% of AI Agents Fail in Production (And What the 9% Do Differently) TryHackMe | Battery | WALKTHROUGH Stop Guessing Your Regex — Test It Live in the Browser I Built FreelancEye, an Open-Source Mobile PWA for Finding Clients Beyond the Hype: My Production Playbook for Docker Swarm Top AI App Builder Platforms with Integrated Backend, Hosting & Database ECS vs EKS in 2026: An Honest Comparison from Someone Who Has Run Both in Production Hardening Your Node.js App Against Supply Chain & Remote Code Execution Attacks linux commands
AccessLens — a blind person's lanyard, powered by Gemma 4 on-device
Hassan Shah · 2026-05-24 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

AccessLens is an Android app that turns a Pixel 8 worn on a lanyard into a persistent visual interpreter for blind and low-vision users. Rear camera forward, bone-conduction headphones in, the phone describes the world — and remembers.

The problem with existing visual-assist apps (Be My Eyes, Seeing AI, Envision) is that they are screen-bound, stateless, and cloud-bound. A blind person navigates by sound; an app that needs you to hold up a phone, tap a screen, and wait on a datacenter interrupts that signal stream. AccessLens is different on three axes:

  • Worn, not held. Two physical buttons drive everything. Volume Up → read text in front of me, verbatim. Volume Down → describe this room with memory from earlier today and recent days. A gyroscope-based SettleTrigger also fires a description automatically when the user stops walking.
  • Persistent memory across days/weeks. Every gesture writes a SessionEvent to a SQLCipher database. A nightly Gemma 4 worker compresses each day into a DailySummary; Sundays roll into a WeeklyMemory. LONG-press prompts splice that history into the Gemma call, so the model has a world model of this specific apartment, this specific day.
  • 100% on-device. No image, audio, embedding, or location leaves the phone. SQLCipher + Android KeyStore (AES-256-GCM wrapping a SecureRandom DB key) protect everything at rest. A SelfTest on first launch opens a probe DB with the wrong key and asserts the read fails before the app reports encryption healthy.

Face recognition uses MediaPipe FaceLandmarker to produce a 192-dim L2-normalized landmark vector per enrolled person. At identify time, cosine-similar matches inject only the names into the Gemma prompt — Gemma never sees a face crop or an embedding, code-review-verified.

Three gestures, three target latencies (Pixel 8, Tensor G3): SINGLE ≤14 s end-to-end, DOUBLE scales with text length, LONG adds memory retrieval. Voice fillers ("I'm looking…", "Still looking…") cover the prefill gap so the user hears acoustic progress, not dead air. Everything runs with airplane mode on after the model is pushed once.

Demo

Code

GitHub logo hassaninnovate / AccessLens

A blind person's lanyard, powered by Gemma 4 E2B on a Pixel 8. 100% on-device visual assistant with persistent memory and face recognition.

AccessLens

An always-on, on-device visual interpreter for blind and low-vision users — built for the DEV.to "Build with Gemma 4" challenge.

License: Apache 2.0 Platform Model Privacy


Pitch

A phone worn on a lanyard becomes the user's "eyes." The rear camera is always on; the gyroscope watches for motion. When the user stops walking, AccessLens describes what's in front of them. When a friend whose face has been enrolled walks into frame, the phone says their name. When the user wants to read what's in front of them, they press Volume Up; for a richer description of the room, Volume Down. Bluetooth bone-conduction headphones carry the audio — the user's ears stay free for the world.

What separates AccessLens from existing apps like Be My Eyes, Seeing AI, and Envision is persistent on-device memory + 100% on-device inference. Existing tools are stateless and cloud-bound. AccessLens runs Gemma 4 E2B locally via LiteRT-LM, encrypts…

Apache 2.0. The repo includes the full Kotlin/Compose source, the encryption self-test, the nightly compression WorkManager job, and a README documenting which file enforces each of the six privacy invariants.

Reference implementation that taught me the LiteRT-LM API: google-ai-edge/gallery — adapted patterns are cited inline in inference/LiteRtLmRuntime.kt.

How I Used Gemma 4

Model: Gemma 4 E2B (litert-community/gemma-4-E2B-it-litert-lm, ~2.59 GB int4), loaded once at service start via LiteRT-LM 0.12.0 with Backend.GPU() for the vision adapter. Three reasons E2B was the right fit:

  1. Multimodal in one model, on-device. Image input goes in as Content.ImageBytes, text as Content.Text, in that order (per the Gallery's "for accurate last token" comment), all through one Engine.generate call. No separate vision encoder + decoder to stitch, no second model to keep resident. That fits the latency budget and the memory budget on Pixel-class 8 GB RAM.

  2. E2B is the smallest competent multimodal Gemma 4. It fits in RAM alongside MediaPipe FaceLandmarker, a CameraX pipeline, and the Compose UI without OOM-ing on a Pixel 8. I prototyped against E4B (the brief's "quality path") and measured the latency lift on one-sentence scene descriptions — not worth doubling the prefill cost for a use case where the user is waiting in real time, lanyard-mounted, with no screen feedback. The architecture is parametric on the model path (InferenceRuntime.load(modelPath, Modality)), so a future LONG-press branch could swap to E4B in one line. I documented the tradeoff in the README and shipped E2B for all three gestures.

  3. Gemma is the only practical way to do nightly memory compression on-device. The 03:00 CompressionWorker calls Gemma in JSON mode to compress the day's SessionEvent rows into a single DailySummary, and on Sundays into a WeeklyMemory. That's a real LLM task — extracting persistent facts, deduplicating recurring observations, distinguishing "the blue mug is mine" from "I saw a blue mug today" — and it has to happen without a network. E2B handles it in under a minute per day on Tensor G3 while the phone is on the charger.

Two production fixes the brief didn't cover, in case they help someone else:

  • The LiteRT-LM Android artifact must be 0.12.0 or later — 0.11.0 fails vision init inside vision_litert_compiled_model_executor.cc:273 on Tensor G3.
  • AndroidManifest needs <uses-native-library> declarations for libOpenCL.so, libOpenCL-car.so, libOpenCL-pixel.so (all android:required="false"). Without them, Android 12+ silently denies GPU OpenCL access and the vision backend fails to initialize. Documented at ai.google.dev/edge/litert-lm/android.

The thing I'm proudest of: when you uninstall AccessLens, the KeyStore wrapping key is destroyed with it. The encrypted DB on disk becomes cryptographically unrecoverable. The user can throw the phone away and their memories — kitchen layout, friends' faces, places they've been — go with it. That's what on-device privacy is supposed to mean, and Gemma 4 + LiteRT-LM made it possible without compromising the assistant on quality.