惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - 司徒正美
aimingoo的专栏
aimingoo的专栏
MongoDB | Blog
MongoDB | Blog
云风的 BLOG
云风的 BLOG
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
酷 壳 – CoolShell
酷 壳 – CoolShell
博客园 - 聂微东
Y
Y Combinator Blog
T
Tailwind CSS Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
S
SegmentFault 最新的问题
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - 【当耐特】
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
J
Java Code Geeks
美团技术团队
Google DeepMind News
Google DeepMind News
博客园_首页
Apple Machine Learning Research
Apple Machine Learning Research
T
The Blog of Author Tim Ferriss

DEV Community

The Speculative Decoding Pattern n8n for Healthcare: 5 Automations for Clinics, Practices, and Health Tech Teams (Free Workflow JSON) How I Built an OWASP Memory Guard for AI Agents (ASI06) I Tested Spam Protection on Formspree vs Formgrid. The Results Were Surprising. May 27 - Video Understanding Workshop Beyond Keywords: How Google's 2026 Algorithms are Redefining SEO From Click to Cart: Ensuring an Accessible Customer Journey in WooCommerce Your company won't replace you with good AI. They'll replace you with bad AI. How to Use an SVG Icon Search Engine as a Claude Custom Connector O fim do “modelo que faz tudo”? Conheça o Conductor, a IA que orquestra outras IAs 10 First-Principles Strategies to Learn Any Programming Language Deeply 10 First-Principles Strategies to Learn Any Programming Language Deeply Understanding Embeddings easily. The Hidden Cost of “Move Fast and Break Things” Why Your Logs Are Useless Without Traces DressCode: Your AI Stylist for Tomorrow The Documented Shortcoming of Our Production Treasure Hunt Engine I'm 16, and I Built an AI Tool That Audits Your Technical Debt Without Ever Touching code Building Your Own Crypto Poker Bot: A Developer's Guide to Blockchain Gaming Logic Apache Iceberg Metadata Tables: Querying the Internals Hermes, The Self-Improving Agent You Can Actually Run Yourself Unity vs Unreal: 5 Things I Had to Relearn the Hard Way Building Agentic Commerce Infrastructure: Overcoming SQLite Concurrency for Autonomous Procurement Agents Solana Accounts vs Databases HTML Table Borders I built a skill that makes AI-generated AWS diagrams actually usable My first post! I'm kinda excited The Page Root Was the Wrong Unit How to audit what your IDE extension actually sends to the cloud I Migrated 23 Make.com Scenarios to n8n and Cut My Bill by 60% — Complete Migration Guide (2026) Solving a Logistics Problem Using Genetic Algorithms Claude Code Skills Explained: What They Are & When to Use Them (2026) Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup Zero-Idle Local LLMs: Running Llama 3 in AWS Lambda Containers We scanned 8 B2B SaaS companies across 5 categories. ChatGPT named the same 12 brands in every answer. How To "Market" Yourself As A Tech Pro We scanned 500 MCP servers on Smithery. Here is what we found. HTML Basics for Beginners – Markup Language, Elements and Types of CSS DiffWhisperer: How I Turned Cryptic Git Diffs into Architectural Stories with Gemma 4 I built a version manager for llama.cpp using nothing but vibe coding. Unit Testing vs System Testing: Key Differences, Use Cases, and Best Practices for 2026 A game design textbook explains why products with fewer features win How to Build a Raydium Launchpad Bonding Curve in 5 Minutes with forgekit How to turn an AI prototype into a production system How Data Lake Table Storage Degrades Over Time Partition and Sort Keys on DynamoDB: Modeling data for batch-and-stream convergence Auto-Generate Optimized GitHub Actions Workflows For Any Stack With This New CLI Tool Unchaining the African Creator Economy The Treasure Hunt Engine Gotcha - A Lesson in Constrained Performance great_cto v2.17 - no more tambourine dance When Catalogs Are Embedded in Storage SafeMind AI: Instant Health & Safety Intelligence What Is PKCE, How It Works & Flow Examples AI Agent Failure Modes Beyond Hallucination Fastest Way to Understand Stryker Solana Accounts Explained to a Web2 Developer TV Yayın Akışı Sitesi Geliştirirken Öğrendiğim Teknik Dersler $500 Challenge Drop My First Look at Google's Gemma 4: A Quick Introduction How I use an LLM as a translation judge Best Calendar and Scheduling API for Developers — 2026 Comparison Agentic AI in Travel: Why UCP Isn't Travel-Ready Yet — and What We Measured I Finished Machine Learning. And Then Changed The Plan. The Five-Thousand-Line File The AI Whirlwind: Why Your Local Agent Matters More Than Ever I Built an Oracle DBA That Lives in Telegram. It Cut a 500K-Row Scan to 5 - After Asking Permission. The Day 2 Reality of Running a Kubernetes Lab on Your Mac: Stop/Start, CKS Scenarios, and What I Learned Building It. n8n for Airtable Power Users: 5 Automations That Take Your Base to the Next Level Validating Gemma 4 for Industrial IoT: A Governance Pattern VS Code Now Credits Copilot on Every Commit by Default Astro and Islands Architecture: Why Your Portfolio Doesn't Need React for Everything Booting from FAT12: How I added file reading to my x86 kernel Unity’s AI agent went public: the developers of a static analysis tool on what that means for code quality Anna's Archive publica un llms.txt para los LLMs que rastrean su catálogo CRDTs for Offline-First Mobile Sync Why I Built Mneme HQ: Preventing AI Agent Architectural Drift Google Antigravity 2.0 Is the I/O 2026 Announcement You Should Actually Care About I Built a Pay-Per-Call Crypto Signal API with x402 — Heres the Architecture JWT Token Refresh Patterns in React 19: Avoiding the Silent Auth Death Spiral 🚀 “From Prompts to Autonomous Agents: What Google I/O 2026 Changed” The Power of Distributed Consensus in Autonomous SOCs Sixteen TUI components, copy-paste, no dependency The Boring Reliability Layer Every Autonomous Agent Needs Nven - Secret manager Building Multi-Tenant Row-Level Security in PostgreSQL: A Production Pattern The Hardest Part of Being a Developer Isn't Coding Building Vylo — Looking for Collaborators, Partners & Early Support I Thought Memory Fades With Time. It Actually Fades With Information. ORA-00064 오류 원인과 해결 방법 완벽 가이드 I registered an AI agent at 1 AM and something cracked open in my head Pitch: Nven - Sync secrets. Ship faster. Why y=mx+b is the heart of AI From Routines to a Crew — Building a System That Plans Its Own Work & executes it 25 React Interview Questions 2026 (With Answers) — Hooks, React 19, Concurrent Mode An open source LLM eval tool with two independent quality signals Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data Skills, Java 17, And Theme Accents 4 Hard Lessons on Optimizing AI Coding Agents Arctype: Cross-Platform Database GUI for LLM Artifacts Your robots.txt says GPTBot is welcome. Your server says 403.
TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4)
Daathwi Naag · 2026-05-23 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4


What I Built

TharVA : Thar Virtual Assistant
A mobile-first, fully offline, multilingual AI assistant for camel herders in the Thar Desert.

Not a general assistant. Not a chatbot.
A field tool built specifically for Camel herders in rural Rajasthan who raise camels in one of the world's harshest environments, have no reliable internet, work with their hands, and need answers in Hindi or any other language, fast, when something goes wrong with an animal.

The spark came from time I spent in Bikaner, talking to Ashok Bishnoi, a social entrepreneur near the National Research Centre on Camel in Jorbeer and to Raika Community camel keepers whose generational knowledge of camel behavior, calving, and desert survival isn't written down anywhere accessible. What they lacked wasn't expertise. It was fast access to reliable guidance at the right moment.

One conversation stayed with me:

  • A calf had been rejected by its mother.
  • A time-critical emergency where the first hours determine survival.

The formal channels couldn't give clear enough answers fast enough. What actually helped was a Raika Community elder who had seen it before and knew exactly what to do.

TharVA is an attempt to make that knowledge reachable in a field, with no signal, in Hindi / any language, with one hand free.

The Two Interaction Modes

It has two interaction modes, built around how field users actually work:

Quick Call — Voice-in, voice-out. Hold a button, speak your question, hear a short direct answer. Streaming generation so TTS begins before the full response finishes. Designed for when you're standing next to a distressed animal and have thirty seconds, not three minutes.

Detailed Chat — Text or Voice input, with image support. Attach a photo of a camel's wound or a skin condition. Get a thorough, structured response. Same model, different prompt, different temperature completely different feel.

Answers are grounded in curated camel husbandry reference material from actual veterinary literature and NRCC research, injected into the system instruction at session start. The model isn't improvising from general training data. It knows the domain because it was given the domain.



How I Used Gemma 4

Model chosen: Gemma 4 E2B

Not the 26B.
Not the 31B.
The smallest one in the family — and that was entirely intentional.

The people TharVA is built for don't have high-end phones or reliable connectivity. The rule I held myself to for the entire build was: if it doesn't load and respond reliably on a mid-range Android phone in realistic conditions, nothing else matters.

The E2B — 2.3 billion effective parameters, running on as little as 4 GB of RAM, is the only model in the Gemma 4 family that makes that possible while still being genuinely capable. I have set the context length to 4096 tokens which shaped all the technical decisions I have made.

The entire inference stack runs on-device through the flutter_gemma package, wrapping Google AI Edge's LiteRT-LM runtime.

  • No cloud API.
  • No data leaving the phone.
  • No signal required.
  • For a community where privacy matters and internet is genuinely unreliable.

Offline-first wasn't a feature preference, it was the baseline.


TharVA's Application Architecture

TharVA's Architecture


Multimodal is no longer a premium feature

Ears (<|audio|>) — Voice input bypasses device-level speech recognition entirely. I record audio as a raw WAV file (PCM 16kHz, 16-bit, mono) and pass the bytes directly to the model. This removed the requirement to pre-install language packs through obscure settings menus that field users would never find. Unexpectedly, the E2B handled local Hindi accents and regional speech patterns from around Bikaner better than device-level ASR did. Voice input that understands your accent is voice input people will actually use.

Eyes (<|image|>) — Users can photograph a wound, a skin condition, or an animal's posture and include that in their question. I capped image support at one image per turn — a deliberate product decision, not a temporary limitation. Allowing multiple images per turn caused context overflow failures mid-conversation that were impossible to handle gracefully in the field. One image per turn gives stable, predictable behavior under real conditions.

Brain (<|think|> + system prompt) — Quick Call and Detailed Chat use the same weights but entirely different system prompts and temperatures. Quick Call prompts bias heavily toward short, direct outputs with lower sampling temperature. Detailed Chat allows longer, structured responses. The model adapts its behavior completely based on what the prompt asks. same brain, different mode.

Mouth / Vocal (streaming TTS) — I use generateChatResponseAsync() to feed tokens into text-to-speech as they arrive. The user starts hearing the response before generation finishes. Without streaming, you wait for full generation then wait for TTS. With streaming, those processes overlap. The perceived latency difference in Quick Call is the difference between an app that feels usable and one that feels broken.

Grounded knowledge (context injection) — Curated camel husbandry reference text is loaded into the system instruction at session start, truncated to a fixed character budget. Every per-turn input is kept lean, a language reminder, an optional location/battery prefix, and the actual question. The knowledge base is in context from the start without consuming fresh tokens on every turn. This was forced by the 4,096-token on-device context limit, which is the real constraint that shaped almost every other technical decision.

Multilingual behavior — Any language or mode change triggers a full session reset: close the inference session, rebuild the system prompt with fresh language reminders, start fresh. Without hard resets, KV cache state bleeds between contexts, the wrong script, the wrong tone, the wrong response length in ways that undermine trust in the app entirely.

The hardest engineering in this project was completely unglamorous: a download recovery system that detects partial model files and restarts cleanly, a runtime compatibility fix for a silent mismatch between the LiteRT-LM version and the updated Hugging Face artifact format, and a turn cap that forces session rotation before context overflow causes silent failures mid-conversation.

None of that shows up in a demo. All of it determines whether the app actually works in a field in Rajasthan.

Code

https://github.com/daathwi/TharVA

The Raika community have kept camels alive in the Thar Desert for centuries. They don't need an AI to tell them what they already know. What TharVA tries to do is make the knowledge that exists in community memory and veterinary literature reachable at the moment when someone needs it with no signal, in Hindi, with one hand free.

That's a narrow goal. I think it's the right one.