UXRay: I Built an AI That Roasts Your UI Like a Senior Designer Would - 惯性聚合

推荐订阅源

PCI Perspectives

Apple Machine Learning Research

Recent Announcements

Hackread – Cybersecurity News, Data Breaches, AI and More

OSCHINA 社区最新新闻

Schneier on Security

Microsoft Azure Blog

奇客Solidot–传递最新科技情报

Recorded Future

Privacy International News Feed

Cisco Talos Blog

Check Point Blog

Netflix TechBlog - Medium

CTFtime.org: upcoming CTF events

Proofpoint News Feed

Hacker News - Newest: "LLM"

钛媒体：引领未来商业与生活新知

宝玉的分享

Full Disclosure

Know Your Adversary

Engineering at Meta

News | PayPal Newsroom

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog

Tor Project blog

The GitHub Blog

www.infosecurity-magazine.com

人人都是产品经理

Google Developers Blog

Stack Overflow Blog

Privacy & Cybersecurity Law Blog

Lohrmann on Cybersecurity

博客园 - 【当耐特】

博客园 - 司徒正美

Hugging Face - Blog

DEV Community

I built a local voice AI that can change to 9 different personalities! Wyrly DI: Type-safe Dependency Injection for Modern TypeScript The contract is the interface: agent-driven Steampipe Stave in one command Gemma 4's Hidden Superpower: Why Built-in Thinking Tokens Change Everything for Evaluation Tasks ⚡ WordPress Performance: The Real Truth They Don't Tell You A Mobile App Usually Needs an Admin System First Customer Portals Should Remove Repeated Admin Work Episode 4: The Time Loop (Layers & Caching) I Built ContextForge with Gemma 4: A Project Memory Generator for Developers and AI Coding Agents Why shadow DOM beat iframe for inline tooltips HOW TO CREATE USER AND ASSIGN ROLES IN AZURE WITH ENTRA ID When AI Blackmail Goes Viral Episode 3: The Secret Scroll (The Dockerfile) Monte Carlo Simulation for Engineers: Turning Uncertainty Into Numbers The tokens-per-byte trap: character-level 'compression' adds tokens Nobody Reads Your Code Anymore Why I built a collection of 5 free, zero-signup career finance tools for solo builders 🚀 New React Challenge: Instant UI with useOptimistic Resolvendo a Alucinação da IA na Arquitetura de Software com Code Property Graphs e .NET 9 S1 — Clean Backtrace Crashes: How to Diagnose and Fix Them Cómo solucionar el bucle infinito en useEffect con objetos y arrays The Brutal Reality of Running Gemma 4 Locally I made Claude Code refuse to write code unless the ticket scores 80/100 I Fed React's Entire Hooks Transition History to Gemma 4. Here's What It Found That We Missed. Building a Private RAG System: Lessons from a Local-First AI Journal CodePulse AI — Reviving an AI-Powered Repository Intelligence Platform How to Split Video into Segments with FFmpeg (CLI + API) I've audited dozens of estate agency websites. The same 5 problems show up every single time. Part 1: Taming Asynchronous JavaScript: How to Build a "Mailbox" Queue Building My AI-Powered VS Code Extension 🚀 Google Login in Express with PassportJS & JWT Great example of Gemma 4 moving beyond chatbots into real-world decision support. Using AI to guide everyday actions like recycling shows how impactful applied LLMs can be when designed for usability, not just capability. #Gemma4 #AI #Sustainability Building a Production AI Chatbot for an Educational Institute: Architecture, Lessons & Full Stack Deep-Dive Google Login in Express with PassportJS & JWT How I reclaimed 47GB on my MacBook by cleaning developer project junk Operators Are Not Oracles: How We Learned to Stop Worrying and Love the Configuration I Built 6 Free Developer Tools for AI APIs, Cron, Docker, and Self-Hosting How I Built a Real-Time Precious Metals Price Feed for 30,000 Concurrent Users in Laravel How to Use a SERP API to Validate Whether a Project Idea Is Worth Building Gemma 4 discussions often focus on capability, but real-world impact depends on deployment context. For offline education, especially in low-connectivity regions, latency, cost, and local inference matter as much as model strength. Local Mind Explores it Space Complexity + Ω and Θ Notations Google I/O 2026 Just Confirmed the Shift From AI Chatbots to AI Agents How to Add API Monitoring to an Express App in 5 Minutes (2026) Designing an In-Game Inflation Tracking Algorithm for Web Utility Apps Google AI Studio Just Changed the Shape of App Development If you struggle to learn then this is for you. Best AI Agent Security & Guardrails Tools in 2026: LLM Guard vs NeMo vs Guardrails AI Building Dynamic RBAC in React 19: From Permission Strings to Component-Level Access Control How to Build a Self-Hosted AI Code Review Tool in Python Why We Switched from React to HTMX in Production: A 200-Site Case Study Gemma-Loom: The Intent-Based Virtual Machine (IVM) for Edge Sovereignty Java实习海投攻略：3天300个沟通，我是怎么拿到面试的 I Deployed Netflix's Web Server in 30 Seconds (And So Can You) - Docker Project 1 Debugging Android 14 WebRTC Disconnects on a coturn Relay Path 1/30 Days System Design Question Testing FastAPI + SQLAlchemy with Real PostgreSQL Fixtures: No More Mocking Misery FAQ Schema Markup Generators: What They Actually Do (and What They Don't Tell You) How a pure-TypeScript flex layout engine closed the last WASM-Yoga gap Spot instances as GitHub Actions runners Agents Need Receipts, Not Just Better Prompts readmegen — Generate beautiful README.md in seconds (12 templates, open source) When AI Reads Blueprints: The Hidden Attack Surface of Multimodal Engineering Intelligence Simplicity scales — complexity kills side projects AI does exactly what you ask — that's the problem How a model upgrade silently broke our extraction prompt (and how we caught it) The Best Form Backend for Static Sites in 2026 # ⛽ I Built a Cross-Platform Fuel Finder with React & Supabase: The Indie Dev Journey The 11 Major Cloud Service Providers in 2025 Membangun Karya Visual: Mengintip Fasilitas Multimedia dan Studio Kreatif Amikom What Is IOPS? Visualizing Database Design: From Interactive Canvas to Drizzle, Prisma, and SQL in Real-time A tool to make your GitHub README impossible to ignore 🚀 Zero-Downtime Blue-Green and IP-Based Canary Deployments on ECS Fargate I reproduced a Claude Code RCE. The bug pattern is everywhere. We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found. Jenkins CI/CD Pipeline for a Dockerized Node.js Application: Manual Trigger vs Automatic Trigger Using GitHub Webhooks How to Stream Live Forex Rates to Google Sheets API: A Complete Guide Small Models Will Beat Giant Models (And Most People Haven’t Realized Why Yet) How I Built 5 Linux Automation Scripts on AWS EC2 I built TokenPatch to measure AI coding cost per applied patch I built a Chrome extension to stop squinting at the web Producer audit clean, six tests red Conversa — A Multi-Agent AI Platform Powered by Gemma 4 Build a Real Agent in 15 Minutes with Gemini's New Managed Agents API What I Actually Build: AI Systems That Ship, Not Demos That Impress The Box Ticked While You Read This: LinkedIn, AI Training, and the Switch You Did Not Flip Investasi Masa Depan: Mengintip Fasilitas Laboratorium Komputer Kelas Dunia di Yogyakarta I Cancelled My $20 Claude Cowork Plan After a Week With OpenWork Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead How To Build an Image Cropper in Browser (Simple Steps) I built a macOS disk cleaner for developers and just launched it would love feedback Membangun Kompetensi dan Relasi: Mengapa Ekosistem Kampus Itu Penting I Built an AI That Decides Which AI to Talk To — Running 24/7 From My Living Room Codex Team Usage SOP How to Actually Become a Programmer: The Hard Part Nobody Wants to Explain Building a Production-Style Multi-Tool AI Agent with Python, Flask, React & Gemini AI The Caretaker Sandbox: An Offline-First Visual Playground & Template Engine powered by Gemma 4 # Building Instagram OSINT Projects with HikerAPI Your AI can read. Gemma 4 can see The Battle of the Senior Dev: Why AI Gives You Wings But Only If You're Ready to Pilot

UXRay: I Built an AI That Roasts Your UI Like a Senior Designer Would

pulkitgovran · 2026-05-23 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

UXRay — drop a screenshot or paste a URL, get a full UX audit in seconds.

Most designers and developers ship UIs without a systematic critique. Hiring a UX consultant is expensive. Running a full user study takes weeks. UXRay closes that gap: it gives you the same structured, heuristic-based analysis a senior UX professional would produce — instantly, locally, and for free.

You give UXRay a UI (file upload or live URL) and it returns:

Overall UX score (0–100)
Cognitive load analysis — is the interface overwhelming users?
Trust score — what signals build or erode credibility?
Friction points — specific elements causing drop-off, each mapped to a Nielsen heuristic and rated critical / warning / info
Prioritized recommendations — actionable fixes sorted by urgency with effort and impact ratings
Accessibility flags — WCAG 2.1 violations visible in the screenshot
Layout analysis — fold content, visual hierarchy strength, whitespace quality, and scan pattern (Z vs F)

The analysis is grounded in established UX theory: Nielsen's 10 Usability Heuristics, Gestalt principles, Fogg's trust heuristics, Sweller's cognitive load theory, and WCAG 2.1. Every friction point cites the exact heuristic it violates so you know why something is a problem, not just that it is.

Stack: Next.js 16 (App Router, TypeScript) · Tailwind v4 · Framer Motion · Gemma 4 E4B via Ollama · Playwright microservice for URL screenshots · Zod for structured output validation

Demo

Live test: I pointed UXRay at dev.to. It captured a full-page screenshot, ran the Gemma 4 analysis, and returned a structured result — 85 overall score, 3 friction points, 3 prioritized recommendations — in about 56 seconds on CPU, no GPU required.

Code

UXRay — AI-Powered UX Analysis

X-ray your interface through AI. Powered by Gemma 4 E4B.

UXRay analyzes any UI screenshot like a behavioral psychologist — detecting cognitive load, trust signals, friction points, and actionable redesign recommendations. It uses Gemma 4's native multimodal vision to see the interface directly, not just process text descriptions.

Built for the Google Gemma 2026 Hackathon on dev.to.

Demo

Upload a screenshot or paste a URL → Gemma 4 analyzes it → structured UX critique appears:

Overall UX Score (0–100)
Cognitive Load gauge with specific issues
Trust Score with positive/negative signals
Friction Points with heuristic references (Nielsen, Gestalt, WCAG)
Recommendations sorted by priority with effort/impact ratings
Accessibility Flags and Layout Analysis

Prerequisites

Ollama installed and running:
```
brew install ollama
brew services start ollama
```
Gemma 4 E4B pulled:
```
ollama pull gemma4:e4b
```
Node.js 18+

Setup

# Clone the repo
git clone <repo-url>
cd uxray
# Install

…

The two key pieces of the pipeline:

1. Gemma 4 client (web/lib/gemma.ts)

Sends the screenshot as a raw base64 image to Ollama's /api/generate endpoint with format: "json" enforced, streams the NDJSON response token-by-token, and validates the output against a strict Zod schema. If JSON parsing fails on the first pass, it automatically retries at a lower temperature (0.1) to coax a clean response.

const response = await fetch(`${OLLAMA_BASE_URL}/api/generate`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemma4:e4b",
    prompt: SYSTEM_PROMPT + "\n\n" + USER_PROMPT,
    images: [base64Image],   // raw base64, no data URI prefix
    format: "json",          // enforces valid JSON output
    stream: true,
    options: {
      temperature: 0.3,
      num_ctx: 8192,
    },
  }),
});

2. Playwright screenshot service (playwright-service/server.js)

A small Express server that accepts a URL, spins up Chromium, captures a full-page screenshot, and returns it as base64. This lets UXRay analyze any live site without leaving the local pipeline.

To run it yourself:

# Pull the model first
ollama pull gemma4:e4b

# Start both services (Next.js on :3000, Playwright on :3001)
npm install && npm run dev

How I Used Gemma 4

I chose Gemma 4 E4B (the 4-billion-parameter multimodal variant) for three reasons:

1. Multimodal vision is load-bearing, not decorative

UXRay's entire value proposition requires seeing the UI. The model has to identify specific elements — button labels, color contrast, spacing, typography — and reason about them in relation to UX principles. Gemma 4's vision capability handles this natively. There's no separate OCR step, no layout parsing pipeline, no element segmentation — the model just looks at the screenshot and reasons.

2. E4B runs on CPU in a reasonable time

The 4B parameter count was a deliberate choice. I wanted UXRay to work on a developer's laptop without requiring a GPU. At ~56 seconds for a full audit on CPU, E4B hits the sweet spot: thorough enough to produce genuinely useful output, fast enough to feel interactive. The 31B Dense model would have been overkill for a local-first tool, and E2B felt too thin for the reasoning depth the structured output requires.

3. JSON mode + structured output validation

Setting format: "json" in the Ollama request pushes Gemma 4 to emit valid JSON directly, which I then validate with a Zod schema. The system prompt defines the exact schema — frictionPoints, cognitiveLoad, trustScore, layoutAnalysis — and the model follows it reliably. This makes the output directly renderable in the UI with zero post-processing.

The system prompt grounds every analysis in specific UX frameworks so the model doesn't just describe what it sees — it diagnoses why it's a problem and cites the principle being violated:

You are UXRay, an expert UX analyst with deep knowledge of:
- Nielsen's 10 Usability Heuristics
- Gestalt principles of visual design
- WCAG 2.1 accessibility guidelines
- Cognitive load theory (Sweller)
- Trust and credibility heuristics (Fogg's Persuasive Technology)
- Conversion rate optimization (CRO)

A real friction point from the dev.to analysis looks like this:

{
  "id": "fp-1",
  "location": "Primary CTA button",
  "description": "Button label 'Get started' is generic — users cannot predict what commitment they're making, increasing hesitation at the conversion moment.",
  "severity": "warning",
  "heuristic": "Nielsen #6 — Recognition over recall"
}

Gemma 4's ability to follow a complex, multi-section JSON schema while simultaneously reasoning about visual design principles across a real screenshot is what makes this whole approach viable. Swap it for a text-only model and UXRay doesn't exist.

Built with Gemma 4 E4B + Ollama + Next.js 16. Runs fully local — your screenshots never leave your machine.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。