Synaptic: A Local-First AI Dev Companion That Remembers How You Think

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Three weeks into learning Rust, I had Copilot open and Stack Overflow in the next tab. I was making progress — or so I thought. I'd ask Copilot how to handle an error, paste the answer in, the compiler would go green, and I'd move on.

Six days in, I realised I had no idea what I was doing.

I could produce Rust. I couldn't think in Rust. Copilot was making me a faster copy-paster. It wasn't making me a developer.

What I needed wasn't more autocomplete. I needed something to stop me before I typed and ask: "Do you actually understand what you're about to do?"

That question became Synaptic.

Synaptic is a local-first AI dev companion powered entirely by Gemma 4. It watches your entire development environment — files, terminal, errors, shell history — and builds a persistent model of how you specifically think and solve problems. When you're stuck or learning a new language, it surfaces your own past solutions rather than generic documentation pulled from the internet.

The centrepiece is the Socratic gate.

When you open a code file, Synaptic reads your last few hours of activity, identifies what concept is most at stake in that file, and streams a targeted question into a HUD overlay within seconds:

"You've been writing JavaScript closures recently. Before you edit this Rust file: how are you thinking about ownership and when values go out of scope?"

The question isn't generic. It's built from your actual history. Gemma 4 evaluates your answer and either lets you through or asks a sharper follow-up targeting exactly what you glossed over. Over time, your explanations get sharper — because the bar doesn't lower.

Beyond the Socratic gate:

Ambient memory pipeline — every file save and terminal command is compressed by Gemma 4 into a structured memory (summary, concepts, significance score, verbatim error text) every 3 seconds, stored locally in SQLite and indexed for semantic search

Vision error pipeline — on macOS, when a terminal error fires, Gemma 4 reads a screenshot of your actual screen to extract the full stack trace, not the truncated shell history
Four query modes — Translate (JS patterns to Rust), Explain (grounded in your own history), Map Concept (to what you already know), Find Solution (have I solved this before?)

Habit mismatch detection — runs continuously, warns when you apply patterns from your old language that will break in your new one

Stuck detection — watches for compound signals (repeated errors, thrashing, excessive app switches) and auto-surfaces the HUD with relevant context before you ask
Electron HUD overlay — always-on-top, appears uninvited when it has something worth saying Everything runs on your machine. No API keys required. No data leaves without your permission.

Demo

Quick start to try it yourself:

git clone https://github.com/cybort360/synaptic
cd synaptic && npm install
ollama pull gemma4:e4b
ollama pull nomic-embed-text
cp synaptic.config.example.json synaptic.config.json
# Edit synaptic.config.json and add a directory to watchPaths
npm run launch

Open a .rs, .ts, .py, or .go file in a watched directory. The HUD will appear within seconds with a question grounded in your recent activity.

Code

GitHub: https://github.com/cybort360/synaptic

The full pipeline in one view:

Files / Terminal / Shell history
        ↓
    Observer (chokidar + history polling + stuck detector)
        ↓
    Archivist
      ├─ Compressor: gemma4:e4b (every 3s, with vision on errors)
      ├─ Embedder: nomic-embed-text (semantic search vectors)
      └─ SQLite (local, persistent, yours)
        ↓
    Connector
      ├─ Semantic search over history
      ├─ Prompt builder (4 modes)
      └─ Reasoner: gemma4:e4b (streaming)
        ↓
    Socratic Engine
      ├─ Fires on file open for recognised code files
      ├─ Streams question word-by-word to HUD
      └─ Evaluates answer, asks follow-up or passes
        ↓
    Dashboard (http://localhost:3777) + HUD Overlay (Electron)

Stack: Node.js 20, TypeScript, Express, WebSocket, Electron, chokidar, sql.js (SQLite), Ollama. No frameworks. No bundler. Around 5,700 lines of original code.

How I Used Gemma 4

I used gemma4:e4b — the 4B effective parameter model — and the choice was deliberate at every level.

Why E4B specifically

Speed is the constraint. Synaptic compresses every file save and terminal command into a structured memory on a 3-second batch cycle. That cycle is a constraint, not a goal — if compression falls behind, the memories become stale. The Socratic question about your Rust ownership file would be grounded in something you did yesterday rather than what you were doing five minutes ago.

I tested gemma4:26b for this. The batches piled up. The tool became a liability. At 4B effective parameters, gemma4:e4b compresses an event in under 2 seconds on a MacBook. The 3-second cycle stays clean. Memories are always fresh.

Multimodal was the unlock. When you hit a terminal error, your shell history often truncates it. The important part — the line number, the variable name, the exact constraint violated — is three screens down.

Gemma 4 is natively multimodal. When Synaptic detects a terminal error, it captures a screenshot of your screen and sends it to Gemma before compression. Gemma reads the actual stack trace off your screen, not the truncated history. This is only possible with a vision-capable model. A text-only 4B model cannot do it. A vision-capable model too large to run locally cannot do it either. gemma4:e4b is the exact intersection: small enough to run constantly, fast enough for real-time use, and genuinely multimodal.

The Socratic gate depends on first-token latency. The gate fires when you open a code file. The HUD slides up. The question starts streaming in word by word while Gemma is still generating. With gemma4:e4b, the first token arrives in under 3 seconds on a MacBook M-series. The question types itself in as the developer reads it. With a larger model, the developer is already ten lines deep before the question finishes generating — and the gate becomes noise.

What breaks with a different model

Alternative	What fails
A 27B local model	Compression batches pile up. The 3-second cycle becomes 30+ seconds and memories fall behind real activity
A non-multimodal 4B	Vision pipeline silently degrades. Errors are compressed without reading the actual screen output
A cloud-only model	The entire privacy guarantee breaks. Code, errors, and history leave your machine
No local model at all	Socratic gate cannot fire on every file open. Latency makes it unusable as a real-time feature

The three tiers Gemma 4 powers

Task	When	Why Gemma 4
Event compression	Every 3 seconds	Speed. Needs to complete before the next event arrives.
Vision error analysis	On terminal errors (macOS)	Multimodal. Reads the actual stack trace off the screen.
Reasoning + Socratic evaluation	On every query and file open	Quality. Generates personalised questions and evaluates answers.

A separate nomic-embed-text model handles embeddings for semantic search — it outperforms a generalist 4B model at this specific task, so I kept it.

What Gemma 4 made possible that wasn't before

The core thesis of Synaptic is that local AI is now capable enough to be the primary intelligence of a real product — not just a demo. gemma4:e4b can read screenshots, generate coherent structured analysis, evaluate the quality of a developer's reasoning, and do all of this on a consumer laptop in real time.

That's new. Six months ago you had to choose between capable and local. With Gemma 4 you don't.

推荐订阅源

DEV Community