Building a local-first AI tutor for my daughter (and 10–14 year-olds in Austrian schools) with Gemma 4

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

My daughter is 13. Like most students her age in Austria, she has an iPad.
Like most parents, I'm uncomfortable about her typing homework into
ChatGPT — not because the answers are wrong, but because everything she
types disappears into a cloud I don't control. Names, schools, half-formed
thoughts, the topics where she's struggling. Stuff I want to stay on her
device.

So I built Lernbuddy. It's a study companion for ages 10–14 that runs
entirely on-device with Gemma 4 E4B. No network calls. No telemetry.
The model lives next to the kid's flashcards and her review history, on
the same disk, and that's it.

What it does

Three things, all driven by Gemma 4 E4B:

Socratic chat. She asks a question, the model gives a hint — never the answer. The system prompt is built around concrete praise ("nice, you spotted that fractions need a common denominator") and refusing to solve problems directly.
Flashcards. Paste a school text → cards. Or just type a topic (irregular verbs, European capitals) → the model pulls from general knowledge and creates a deck. Every card lands in a preview where you can edit, delete, or add more before they go in the database.
Quiz with model-validated answers. Typed answers, not multiple- choice. The model reads the correct answer and the kid's answer and decides correct / almost / incorrect — spelling errors and paraphrasing are tolerated. It even writes a short personalized summary at the end: "You handled past participles well. 'choose' and 'speak' got mixed up — they'll come back tomorrow."

Behind the quiz sits a small SM-2 spaced-repetition scheduler. Cards she
gets wrong come back the next day; cards she gets right keep slipping
further out. A streak counter and a per-topic "✓ 12 solid · … 4 to
practise" badge make progress visible without dashboards.

Architecture

The whole thing is a .NET 9 MAUI app — one codebase, four targets
(Windows, Android, iOS, macCatalyst), bilingual from day one (German is
the primary audience, English the default for international demos).

Inference goes through Microsoft.Extensions.AI's IChatClient
interface. The concrete implementation wraps LLamaSharp running
unsloth/gemma-4-E4B-it-GGUF (Q4_K_M quant, ~4.6 GB). Gemma 4's chat
template — <|turn>role\n...<turn|> — gets built by hand; the embedded
jinja in the GGUF carries tool-calling logic the kid will never need.

A singleton InferenceStatus service publishes "Loading model… 4.6 GB",
"Generating… 23 tokens (4.1/s)" to a small banner that's visible on every
page. When prompt-processing takes 30 seconds before the first token (CPU
is slow on long prompts), the banner still ticks. No dead spinners —
that matters more than raw throughput when you're 11 and waiting.

Long source texts get split by paragraph and sentence into
~500-character chunks. Cards appear rolling instead of waiting two
minutes for one mega-prompt. Total throughput is slightly worse;
perceived time is much better.

The privacy story

The project's privacy contract starts with "local first": no PII in
prompts, parent-PIN gate on any future cloud mode, model download only
over WiFi with explicit consent. The current app makes zero outbound
network calls after install. The only thing that ever leaves the device
is what you type into a Gemini API form — and we don't have that form.
By design.

That's the differentiator. There are dozens of "ChatGPT for kids" apps;
almost none of them can credibly say "your child's homework never leaves
their device."

What I learned

I went down three runtime rabbit-holes before things stuck. ONNX Runtime
GenAI doesn't support the Gemma 4 multi-file ONNX split yet.
transformers.js + WebGPU inside a MAUI HybridWebView is theoretically
beautiful and practically a stack of four bleeding-edge components
silently failing in ways nothing surfaces. LLamaSharp + GGUF works —
except LLamaSharp doesn't ship iOS-arm64 natives. So I built llama.cpp
myself for iOS on a Mac, statically linked against the resulting .a
files via MAUI's NativeReference mechanism, and pointed
[DllImport("llama")] at __Internal via a runtime resolver.

The lesson: for a multi-platform local-LLM .NET project in 2026, pick the
runtime that has the deepest community, not the one with the prettiest
abstraction. llama.cpp has a community measured in thousands; the .NET
wrappers around it inherit that depth almost for free.

Try it

Code: https://github.com/gpiwonka/Lernbuddy

more: https://piwonka.cc/Lernbuddy

License: MIT. Built solo, on the side, between a day job and a tired
father's evenings. PRs welcome — especially from teachers who can point
out what's missing from a learning-science perspective.

The submission is for the Gemma 4 Challenge. The motivation is more
personal.

推荐订阅源