This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
UXRay — drop a screenshot or paste a URL, get a full UX audit in seconds.
Most designers and developers ship UIs without a systematic critique. Hiring a UX consultant is expensive. Running a full user study takes weeks. UXRay closes that gap: it gives you the same structured, heuristic-based analysis a senior UX professional would produce — instantly, locally, and for free.
You give UXRay a UI (file upload or live URL) and it returns:
- Overall UX score (0–100)
- Cognitive load analysis — is the interface overwhelming users?
- Trust score — what signals build or erode credibility?
- Friction points — specific elements causing drop-off, each mapped to a Nielsen heuristic and rated critical / warning / info
- Prioritized recommendations — actionable fixes sorted by urgency with effort and impact ratings
- Accessibility flags — WCAG 2.1 violations visible in the screenshot
- Layout analysis — fold content, visual hierarchy strength, whitespace quality, and scan pattern (Z vs F)
The analysis is grounded in established UX theory: Nielsen's 10 Usability Heuristics, Gestalt principles, Fogg's trust heuristics, Sweller's cognitive load theory, and WCAG 2.1. Every friction point cites the exact heuristic it violates so you know why something is a problem, not just that it is.
Stack: Next.js 16 (App Router, TypeScript) · Tailwind v4 · Framer Motion · Gemma 4 E4B via Ollama · Playwright microservice for URL screenshots · Zod for structured output validation
Demo
Live test: I pointed UXRay at dev.to. It captured a full-page screenshot, ran the Gemma 4 analysis, and returned a structured result — 85 overall score, 3 friction points, 3 prioritized recommendations — in about 56 seconds on CPU, no GPU required.
Code
UXRay — AI-Powered UX Analysis
X-ray your interface through AI. Powered by Gemma 4 E4B.
UXRay analyzes any UI screenshot like a behavioral psychologist — detecting cognitive load, trust signals, friction points, and actionable redesign recommendations. It uses Gemma 4's native multimodal vision to see the interface directly, not just process text descriptions.
Built for the Google Gemma 2026 Hackathon on dev.to.
Demo
Upload a screenshot or paste a URL → Gemma 4 analyzes it → structured UX critique appears:
- Overall UX Score (0–100)
- Cognitive Load gauge with specific issues
- Trust Score with positive/negative signals
- Friction Points with heuristic references (Nielsen, Gestalt, WCAG)
- Recommendations sorted by priority with effort/impact ratings
- Accessibility Flags and Layout Analysis
Prerequisites
-
Ollama installed and running:
brew install ollama brew services start ollama
-
Gemma 4 E4B pulled:
ollama pull gemma4:e4b
-
Node.js 18+
Setup
# Clone the repo git clone <repo-url> cd uxray # Install
…
The two key pieces of the pipeline:
1. Gemma 4 client (web/lib/gemma.ts)
Sends the screenshot as a raw base64 image to Ollama's /api/generate endpoint with format: "json" enforced, streams the NDJSON response token-by-token, and validates the output against a strict Zod schema. If JSON parsing fails on the first pass, it automatically retries at a lower temperature (0.1) to coax a clean response.
const response = await fetch(`${OLLAMA_BASE_URL}/api/generate`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "gemma4:e4b",
prompt: SYSTEM_PROMPT + "\n\n" + USER_PROMPT,
images: [base64Image], // raw base64, no data URI prefix
format: "json", // enforces valid JSON output
stream: true,
options: {
temperature: 0.3,
num_ctx: 8192,
},
}),
});
2. Playwright screenshot service (playwright-service/server.js)
A small Express server that accepts a URL, spins up Chromium, captures a full-page screenshot, and returns it as base64. This lets UXRay analyze any live site without leaving the local pipeline.
To run it yourself:
# Pull the model first
ollama pull gemma4:e4b
# Start both services (Next.js on :3000, Playwright on :3001)
npm install && npm run dev
How I Used Gemma 4
I chose Gemma 4 E4B (the 4-billion-parameter multimodal variant) for three reasons:
1. Multimodal vision is load-bearing, not decorative
UXRay's entire value proposition requires seeing the UI. The model has to identify specific elements — button labels, color contrast, spacing, typography — and reason about them in relation to UX principles. Gemma 4's vision capability handles this natively. There's no separate OCR step, no layout parsing pipeline, no element segmentation — the model just looks at the screenshot and reasons.
2. E4B runs on CPU in a reasonable time
The 4B parameter count was a deliberate choice. I wanted UXRay to work on a developer's laptop without requiring a GPU. At ~56 seconds for a full audit on CPU, E4B hits the sweet spot: thorough enough to produce genuinely useful output, fast enough to feel interactive. The 31B Dense model would have been overkill for a local-first tool, and E2B felt too thin for the reasoning depth the structured output requires.
3. JSON mode + structured output validation
Setting format: "json" in the Ollama request pushes Gemma 4 to emit valid JSON directly, which I then validate with a Zod schema. The system prompt defines the exact schema — frictionPoints, cognitiveLoad, trustScore, layoutAnalysis — and the model follows it reliably. This makes the output directly renderable in the UI with zero post-processing.
The system prompt grounds every analysis in specific UX frameworks so the model doesn't just describe what it sees — it diagnoses why it's a problem and cites the principle being violated:
You are UXRay, an expert UX analyst with deep knowledge of:
- Nielsen's 10 Usability Heuristics
- Gestalt principles of visual design
- WCAG 2.1 accessibility guidelines
- Cognitive load theory (Sweller)
- Trust and credibility heuristics (Fogg's Persuasive Technology)
- Conversion rate optimization (CRO)
A real friction point from the dev.to analysis looks like this:
{
"id": "fp-1",
"location": "Primary CTA button",
"description": "Button label 'Get started' is generic — users cannot predict what commitment they're making, increasing hesitation at the conversion moment.",
"severity": "warning",
"heuristic": "Nielsen #6 — Recognition over recall"
}
Gemma 4's ability to follow a complex, multi-section JSON schema while simultaneously reasoning about visual design principles across a real screenshot is what makes this whole approach viable. Swap it for a text-only model and UXRay doesn't exist.
Built with Gemma 4 E4B + Ollama + Next.js 16. Runs fully local — your screenshots never leave your machine.


























