Memoria — A Local AI Reading Companion Powered by Gemma 4
This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
Reading long books can be difficult even for people who love reading.
Readers forget characters, lose track of earlier events, struggle with dense prose, or return to a book after a break and feel disconnected from the story. For readers with ADHD, memory difficulties, cognitive fatigue, or accessibility needs, this becomes even harder.
Memoria is a local AI reading companion powered by Gemma 4 that helps readers stay connected to books through spoiler-safe recaps, contextual Q&A, character memory, speaker attribution, and text simplification — all while running locally on the user’s machine.
The app combines an EPUB reader with AI-powered reading support features including:
- Spoiler-safe chapter recaps
- Character memory tracking
- Speaker attribution for dialogue
- Contextual book Q&A
- Passage explanations
- Text simplification for difficult prose
- Retrieval-based memory of earlier chapters
Everything runs locally using Gemma 4 through llama.cpp, so readers do not need a paid AI subscription or constant internet access.
Demo
Features shown in the demo
- Uploading and processing EPUB books
- AI-generated chapter recaps
- Character tracking across chapters
- Context-aware Q&A
- Highlight-to-explain workflow
- Text simplification for difficult passages
- Spoiler-safe retrieval limited to completed chapters
Code
GitHub Repository: https://github.com/Santhoshl2312/Gemma_book_reader
Main technologies used
- Gemma 4 E2B
- llama.cpp
- FastAPI
- SQLite
- ChromaDB
- Vanilla JavaScript
- HTML/CSS
How I Used Gemma 4
Memoria uses Gemma 4 as the core local reasoning engine for the entire reading experience.
I used the Gemma 4 E2B model through a local llama.cpp OpenAI-compatible server, allowing the application to run fully offline without relying on cloud APIs.
Why Gemma 4 E2B?
I specifically chose Gemma 4 E2B because it was the best fit for a responsive local reading assistant.
The project needed:
- Fast inference speeds
- Low VRAM usage
- Good reasoning quality
- Reliable structured outputs
- Practical local deployment on consumer hardware
Gemma 4 E2B delivered the right balance between speed and capability, making it possible to provide near real-time responses for recaps, contextual Q&A, text simplification, and chapter processing while still running locally through llama.cpp.
This was especially important because the app performs many smaller AI tasks continuously in the background while the user reads.
What Gemma 4 Powers
Spoiler-Safe Recaps
Gemma summarizes chapter chunks into structured summaries and key events that help readers quickly reconnect with the story.
Character Memory
The model updates persistent character descriptions and remembers important events tied to each character across chapters.
Speaker Attribution
Gemma helps identify ambiguous dialogue speakers when rule-based systems fail.
Contextual Q&A
Readers can ask questions about the story, and Gemma answers using chapter-aware retrieval that avoids future spoilers.
Text Simplification
Selected passages can be rewritten into clearer modern English while preserving meaning and tone.
Technical Architecture
The frontend is a lightweight EPUB reader built with vanilla HTML, CSS, and JavaScript. It handles book uploads, chapter navigation, reading controls, themes, typography settings, and the AI interaction panel.
The backend is built with FastAPI and SQLite. It manages books, chapters, summaries, embeddings, character memory, retrieval, and streaming responses.
The AI stack runs fully locally using llama.cpp:
- Gemma 4 E2B runs as the local chat and reasoning model
- Nomic embeddings power semantic retrieval
- ChromaDB stores vector embeddings per book
- Background processing pipelines analyze chapters incrementally
The app processes books chapter-by-chapter instead of trying to load entire novels into context at once. Intermediate artifacts like summaries, character memory, embeddings, and speaker metadata are stored and reused throughout the reading experience.
This pipeline-first design makes the system faster, more grounded, and more practical for long-form reading.
Spoiler-Safe Retrieval
One of the biggest design goals was preventing accidental spoilers.
When a reader asks a question, Memoria retrieves only information from chapters the user has already completed. The retrieval system filters vector search results using reading progress before sending context to Gemma 4.
This allows the app to help readers remember earlier story details without revealing future events.
Challenges
Handling Long Books
Full novels are too large to send directly into a local model context window. I solved this by chunking chapters into smaller sections while carrying forward rolling summaries and character memory.
Structured Output Reliability
Local models sometimes wrap JSON outputs in extra formatting or explanations. To make the pipeline reliable, prompts were heavily constrained and the backend extracts valid JSON blocks safely before processing.
Speaker Attribution
Dialogue attribution in fiction is difficult because speakers are often implied instead of explicitly named. I used a hybrid approach where rules handle obvious cases while Gemma handles ambiguous dialogue using broader context.
Fully Local Deployment
The project depends on multiple services including Gemma 4, embedding models, Python environments, and vector databases. I automated the setup process using launcher scripts so the app can be started locally with minimal manual configuration.
Why Local AI Matters
One of the main goals of this project was accessibility and digital equity.
Readers should not need:
- expensive subscriptions
- cloud AI services
- constant internet access
- external data collection
By combining Gemma 4 with llama.cpp and local retrieval, Memoria creates a fully local AI reading companion that respects reader privacy while remaining accessible on consumer hardware.
This makes the project useful not only for individual readers, but also for classrooms, libraries, care settings, and offline learning environments.
Conclusion
Memoria demonstrates how Gemma 4 can power practical, privacy-friendly accessibility tools beyond chatbots.
Instead of replacing reading, the goal is to support readers — helping them stay connected to stories, remember context, and reduce cognitive load while preserving the experience of reading itself.
By combining Gemma 4 E2B, llama.cpp, retrieval, and structured processing pipelines, Memoria turns static EPUB books into adaptive reading experiences that can run entirely offline.




















