慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
五事`flutter_gemma`不言,载Gemma 4于Android之运
Manoj H M · 2026-05-24 · via DEV Community

此乃投于Gemma 4之赛:论Gemma 4

吾于十七日间,成Gemma 4之助手于安卓。声、视、RAG、八器之动,一俟模型驻于器,皆可离线。是项目名曰PocketClaw,若尔所求,可于此处阅之

此篇所述乃吾五事之艰悟。非在flutter_gemma之README。非在Google之MediaPipe文牍。非在吾周读之"于Android运Gemma"六篇教程中。

若尔将运Gemma 4于Android,愿此可省尔周末之劳。

📱 相辅之文: 吾如何构建PocketClaw——以Gemma 4 E2B于安卓之上,成全然离线之AI助手。示其形之视频,析其构之深,源码尽呈.

1. 小模之型,隐事于中段。要者置首。

吾于Claude与GPT-4之上,营构诸使凡十八月。二者皆能善御长篇之系统提示。汝可杂糅指令与事实,序无定,而模型自辨孰为事实,孰为行止之则。

Gemma 4 E2B则不然。

吾初为PocketClaw设系统提示,若此:

You are Claw. You run locally and offline. You are talking to Manoj Shetty.
Match your answer length to the question. Prefer plain answers over preambles.
Never restate the question. If unsure, say so briefly.

入全景模式 出全景模式

人问"吾名何谓?"爪答"吾不知尔名。"屡次如是。名正坐于第三句中。

余凝视此句约一时辰。事后推论,"勿复述所问"乃为显赫之指令,而模子泛化之至"勿引用户之境"。此乃二B效能模子之泛化。云中之大语言模型则不然。

此非字词之弊,乃结构之失也:

final namePart = (userName != null && userName.trim().isNotEmpty)
    ? "The name of the user is ${userName.trim()}.\n\n"
    : '';
final systemPreamble = '${namePart}You are Claw, ...';

入全屏模式 出全屏模式

事同此理。移至提示语首行。独行之。平直之句。初试即效。

道之教也广矣。尔与二B之模,凡欲模所忆者,皆置启首,简句为之,无竞令于同段。模之注目,首重于始,中反轻焉。视系统之令,若填格之模,非为长段也。

2. 茶味RAG,破于用户所实键之问。

若尝与 RAG 相涉,则知典籍之设。析文为段,铸段为象,藏矢于器,询时嵌问,觅近配焉。于专问之标,效验甚佳。

然于"撮要全文",则无能为也。

吾于周五得此。前两周,吾心专注,几无旁骛。自谓已得可行之构。吾取一散置之PDF(一篇论边缘LLM之文),书曰“撮要此文”,遂发之。爪回曰“撮要此文”。止此而已,若应声之响。吾更易辞复试二次,终得同应。

后乃书“撮要llmaiedge.pdf”以实名,得真要言。

一观之,其弊昭然若揭。“撮要此文”与文意全无相干。PDF中未载“撮要”二字,亦无“此篇”之称。余弦相似度之算,徒劳无功。所索之段,空无一物。Gemma得用户之问,然无文境之实相随。故如诸大语言模型,匮乏文境之际,皆作虚妄之应,自其训练之资中,幻化一泛泛之答。

吾所送之补丁,乃两层启发式之法也。

final isGenericIntent = hits.length <= 1 && (
    lower.contains('summari') ||
    lower.contains('tldr') ||
    lower.contains('explain') ||
    lower.contains('describe') ||
    lower.contains('the document') ||
    lower.contains('the pdf')
);

if (isGenericIntent) {
    hits = await RagService.instance.getDocStarts(
        conversationId: _conversation.id,
    );
}

入全景模式 出全屏模式

getDocStarts乃一小退策也。其行。searchSimilar每索引一文档,即以该文档之名为询。文件名乃稀有之别标识。向量库中每段元数据皆含文件名。故此检索得真实片段,无论用户询何,皆然。

條件判斷之理有二。所謂"爪可總覽PDF"與"爪反應汝問"者,其異何在?

若汝於Gemma四(或任何微型設備模型)上構建RAG,須先以泛應問之,方可行之。汝之教科書相似性搜尋,將呈現似模型有弊之象。

3. flutter_gemma插件束縛約三十三兆之原生庫,汝或未用之。

《口袋爪》之股票APK,其量185兆字节,甚觉沉重。

吾解之,察其本库(唯arm64-v8a,吾已发单构之),所见若此:

26 MB  libllm_inference_engine_jni.so       (needed)
24 MB  libLiteRtLm.so                       (needed)
17 MB  libgemma_embedding_model_jni.so      (don't use — using Gecko)
17 MB  libgecko_embedding_model_jni.so      (needed)
14 MB  libmediapipe_tasks_vision_jni.so     (needed — vision input)
14 MB  libmediapipe_tasks_vision_image_generator_jni.so  (NOT USED)
10 MB  libimagegenerator_gpu.so             (NOT USED)
8  MB  libLiteRtGpuAccelerator.so           (needed)
8  MB  libLiteRtWebGpuAccelerator.so        (NOT USED — Android has OpenCL)
9  MB  libtext_chunker_jni.so               (needed)

入全屏模式 出全屏模式

图像生成之库,乃用以使Gemma以图像。唯《口袋爪》用之。消受之像(視覺輸入至多模態Gemma)。吾永不相為生成。WebGPU加速器為瀏覽器設——安卓用OpenCL。無一能於吾之目標平台為效。

android/app/build.gradle.kts中四行:

packaging {
    jniLibs {
        excludes.addAll(listOf(
            "**/libimagegenerator_gpu.so",
            "**/libmediapipe_tasks_vision_image_generator_jni.so",
            "**/libLiteRtWebGpuAccelerator.so",
            "**/libLiteRtTopKWebGpuSampler.so"
        ))
    }
}

入全屏模式 出全屏模式

APK自185 MB减至152 MB,减33 MB。视输入犹可,嵌入犹可,推演犹可。

若汝之用例相类(聊且视输入加RAG,无图像生成),可录此排除。若用例殊异——如汝实欲Gemma生图像——则图像生之庑当在。要之,察汝插件所引,去其不用。flutter_gemma 乃为通用之能面而设,非为求设备之字节极微。

此处有更重之次第点。MediaPipe 是其因也。flutter_gemma 甚巨。此亦其能辨视闻之由也。基于 llama.cpp 之替代品,于 Android 上发布者仅三十至六十兆,然则全然舍弃多模态。故择之实在于此:152 兆带视,或 60 兆不带。欲得多模态而尺寸若纯文之栈,实无免费之餐。依汝产品之实需而择之。

4. 勿饲 128K 之文境。使之紧凑。

Gemma 4 之文境有百二十八千字。理之佳也。然实为步罡之器.

凡提示之符,皆耗迟滞于解码之时。每符皆耗内存。于手机,此二者皆窘迫。若率尔将全史之语纳于文境,每轮次,二十轮之速,显逊五轮;五十轮或致内存溢出,使应用崩坏。

《袖中爪》恒存近二十四言之滑窗,其本真不替。旧者则历压缩之程:

  1. 撷取用户明言之实(若"吾乃X","忆Y","吾名Z")。
  2. 摄未决之志(如"修之"、"待办"、"事端"等字)。
  3. 汇而为一轻简之要言,预缀于启,若记忆之附。

此乃对话之属。其激烈处,则在图像之处理也。

寻常用户所上传之照片,约莫一兆字节。若以base64编码入提示,则近三万token。此乃一图像之四分之一,占尽全部十二万八千之上下文视窗。若用户于对话中上传三图,则汝之上下文预算,必陷于困厄矣。

是故,PocketClaw之行也:当图像之讯滑越二十四言之界,其原始图像之字节即消于内存。所存者,惟助士先前所述图像之文耳:

String _imageMemoryFromAssistant({
  required String? imageName,
  required String assistantText,
}) {
  final label = imageName ?? 'uploaded image';
  return 'Assistant previously described $label as: '
         '${_shorten(assistantText, 1000)}';
}

入全屏模式 出全屏模式

爪犹"忆"所见,然惟描述得入提示。三万token之base64块化简为百token之概要。此乃图像记忆之压缩,约三百倍也.

然其妙处在于:此法甚适于用户所问之续问。"彼图中所载何物?"仅凭描述即可答之。模型鲜需像素。若需,用户可再上传。

此间之常理,勿视上下文为"至限之闲隙"。当视之为一预算。当以此预算,资于模型于当前之回合所需。其余皆压缩为文之概要

5. 当地音频于flutter_gemma,受限于Gemma 3n,非Gemma 4。

此一事,吾欲使汝知之,免汝虚度若吾所误之岁月.

Gemma 4 E2B之模型卡列有本音为支持之模态。吾乃思:妙哉,吾将全弃语音转文字插件,直将原始音频字节馈于Gemma,得一单多模调用,而非STT-then-LLM之管路。愈洁.

吾深究flutter_gemma v0.15.1之源码,以觅音频API。睹此注于界面:

/// [supportAudio] — whether the model supports audio (Gemma 3n E4B only).
bool supportAudio = false,

进入全屏模式 退出全屏模式

插件之音频代码,系于Gemma 3n。若于加载Gemma 4 E2B时设supportAudio: true,则或遇加载之误,或于推论时默然失效。原生端实支音频(C++引擎善之),然Dart端之检,拒非3n之模。

故PocketClaw用安卓系统之STT(其speech_to_text 之包,乃 RecognizerIntent 之裹也。其利:吾得随声而录。文逐字现于输入之域,而持其微。此较之按纽而语,释之待三秒以候音上传处理,复见己言与AI之应,实为优也。

何时(若)乎?flutter_gemma揭E2B之音,径遂绝。迨是时,系统STT合文式Gemma乃为正构。

所得非“音之弊”,乃:信其能旗,先察其插件之源。尤宜于跨Gemma版之多元特征。模型能之,非谓插件已为君之模型所裹。

五法存焉。皆非载于README。亦非见于Google之文。吾悉学之,盖因实作而观其败于殊途也。

若尔于Android之Gemma 4上构作,此五法将省尔时。欲观五法共运于实应用中,PocketClaw乃全然开源,依MIT之许。

与Gemma 4 E2B共处十七日,于中端安卓之机,吾所思者,乃2B之模,运行于用户之器时,其能何其强也。迟滞之感,异于云端。无"AI思量"之延,盖无网络故也。但应答,速若君之机。

此诚可优化之事也。


得 Claude 之助以编校。此帖中所有代码、决断、瑕疵及工程之择,皆出吾手。