This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
You know the charts that look dramatic but are actually showing a 3% change? The Y-axis that conveniently starts at 95 instead of 0. The 3D pie chart whose slices somehow add up to 108%. The stock line that’s “up 59.5%!” — over a five-month window hand-picked from a bad year.
I see these constantly — news, earnings decks, social posts — and it bugs me every time. So I built DataDetective: drop in any chart image and Gemma 4 gives you a forensic breakdown — what manipulation tricks are in play, an integrity score from 0–100, what the chart actually shows vs. what it wants you to think, and how to fix it.
The whole thing runs locally through Ollama. No API keys, no cloud, nothing leaves your machine — which matters when the thing you’re analyzing is an internal financial chart or a competitor’s deck.
It ships with three intentionally-misleading sample charts (a truncated bar chart, a cherry-picked line, and that impossible 108% pie) so you can see it work in one click, plus drag-and-drop for your own images.
Demo
Repo: github.com/kumarsparkz/datadetective
# grab Gemma 4 through Ollama (e4b runs comfortably on a laptop)
ollama pull gemma4:e4b # ~9.6 GB; or gemma4:26b if you have the RAM
ollama serve
# serve the app — it's just static files
python3 -m http.server 8080
# open http://localhost:8080
Green dot = Ollama connected and a Gemma 4 model detected. Click a sample or upload a chart.
Here’s what it actually returns (measured on gemma4:e4b, not aspirational):
| Chart | Trust score | What Gemma 4 flagged |
|---|---|---|
| 108% pie chart | 35 / 100 |
[high] Inconsistent totals (sum > 100%) — “the parts sum to 108%, not 100%”
|
| Cherry-picked stock line | 35 / 100 |
[high] Cherry-picked time range + promotional language |
| An honest bar chart (control) | 95 / 100 | nothing — “a highly effective and honest visualization” |
That last row is the one I’m proudest of. A tool that flags everything is useless. The honest chart scoring 95 next to the pie scoring 35 is what makes this feel like it’s reasoning, not pattern-matching for keywords.
How I Used Gemma 4
Why local, why Gemma 4
Privacy is the real reason. If you’re analyzing your company’s revenue charts or a competitor’s investor deck, shipping those images to a cloud API feels wrong. Local means the data literally never leaves the machine. Gemma 4’s open weights make that possible, and it handles multimodal input natively — you POST to localhost:11434/api/chat with the model, your messages, and an images: [base64] array. No separate vision encoder, no plumbing.
The thing that actually made it work: let the model think first
Here’s the part worth reading if you build on local models.
My first version used Ollama’s format: 'json' flag. It felt great — guaranteed parseable JSON, no regex-ing it out of markdown. But the analysis quality was quietly terrible on the subtle cases. I fed it the classic truncated-axis bar chart (Y-axis starting at $95M so a 5% rise looks enormous) and it returned a trust score of 90–95 and didn’t flag the axis at all — three times in a row. It would read the axis labels correctly and then conclude the chart “accurately represents the increase.”
The problem wasn’t the prompt. It was that format: 'json' forces the model to emit the JSON object immediately, with no room to reason first. A small model like e4b needs to work through “the axis starts at 95, not 0, therefore the bars exaggerate a 5% change” in plain text. JSON mode amputates exactly that step.
So I dropped format: 'json' and restructured the prompt into an explicit procedure — reason out loud through axis baseline, pie totals, time window, and language, then output the final answer in a \`json fence. Same model, same chart:
- 108% pie: caught the bad total as a high-severity flag, score dropped to 35.
- Truncated axis: started naming the non-zero baseline instead of waving it through.
- Honest chart: still scored 95 — so the extra scrutiny didn’t make it paranoid.
The core call now looks like this — note the absence of format:
`jsx
const response = await fetch('http://localhost:11434/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'gemma4:e4b',
messages: [
{ role: 'system', content: FORENSICS_PROMPT }, // includes a step-by-step procedure
{ role: 'user', content: 'Reason step by step, then return JSON...', images: [base64Data] }
],
stream: false,
options: { temperature: 0.3, num_predict: 4096 } // low temp = consistent forensics
})
});
`
Then I parse the JSON out of the fenced block — and keep the reasoning text before it.
Bonus: the reasoning became a feature
Because the model now thinks in plain text before answering, I had its actual forensic reasoning sitting right there. So I surface it in a collapsible “Gemma 4 Reasoning” panel. You can watch it add up the pie slices and catch the 108% itself. That transparency — showing the work, not just a verdict — turned out to be the most compelling thing in the whole app.
Being honest about the limits
e4b is the small variant, and it shows. It nails cherry-picking and impossible pie totals every time, but the truncated-axis case it catches maybe 3 runs out of 4, and as a medium issue rather than a high one. gemma4:26b (26B params, only ~3.8B active per pass thanks to the MoE design) handles it far more decisively — the architecture scales cleanly, you just trade RAM and a few seconds of latency. I built and tuned everything against e4b specifically to prove the concept works on hardware people actually have.
The frontend
Zero dependencies — HTML, CSS, vanilla JS. Dark glassmorphism theme, an animated SVG trust gauge (stroke-dasharray), staggered result cards, and system/light/dark themes. The three sample misleading charts are drawn with the Canvas API so there are no external image assets.
What I Learned
The headline lesson: on local models, JSON mode is a trap for any task that needs reasoning. Convenience at the parsing layer cost me the model’s entire analytical capacity. Letting Gemma 4 think out loud first — and parsing the JSON out of the tail — was the difference between a tool that rubber-stamps misleading charts and one that actually catches them. And it handed me a transparency feature for free.
Team
Solo project — just me and an unreasonable number of misleading charts I’ve been annoyed by over the years.




















