惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

DEV Community

An open source LLM eval tool with two independent quality signals Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data Skills, Java 17, And Theme Accents 4 Hard Lessons on Optimizing AI Coding Agents Arctype: Cross-Platform Database GUI for LLM Artifacts Your robots.txt says GPTBot is welcome. Your server says 403. Organizing How to Use AWS Glue Workflow 5 n8n Automations Every Digital Agency Should Be Running (Bill More, Work Less) Getting Started with TorchGeo — Remote Sensing with PyTorch Designing a Scalable Cross-Platform Appium Framework Google Antigravity 2.0 & Slash Commands Building a Unified Adaptive Learning Intelligence with Gemma 4, Flutter, and Multi-Model Orchestration Looking for beta testers for a £60 server management application The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally. Why AI Should Not Write SQL Against ERP Databases Vibe coding works until it doesn't. The debt is real. Shipping at the Edge: Migrating a Coffee Subscription Platform to Cloudflare Workers Stop Tab-Switching: A Developer's Guide to Color Tools That Actually Fit the Workflow DevOps vs MLOps vs AIOps: What Changes, What Stays, and a Simple Roadmap to Get Started Run Powerful AI Coding Locally on a Normal Laptop 5 n8n Automations Every WooCommerce Store Needs (Save 10+ Hours/Week) What I Learned Building My Own AI Harness Hytale Servers Will Fail Treasure Hunts Until We Fix Our Event Handling Redux in React: Managing Global State Like a Pro Unfreezing Your GitHub Actions: Troubleshooting Stuck Deployments and Protecting Your Git Repo Statistics Unlocking Project Discoverability on GHES: A Key to Software Engineering Productivity When the Cleanup Code Becomes the Project Rockpack 8.0 - A React Scaffolder Built for the Age of AI-Assisted Development Mismanaging the Treasure Hunt Engine in Hytale Servers Will Get You Killed Stop Calling It an AI Assistant. It’s Already Managing Your Company Why Hardcoded Automations Fail AI Agents Why I built a post-quantum signing API (and why JWT is on borrowed time) Weekend Thought: Frontend Build Tools Suffer From Work Amnesia A 10-Line Playwright Trick That Saved Me Hours on Every Sephora Run AI Is Changing Engineering Culture More Than We Realize Everyone Was Focused on Gemini, But Infinite Scaler Was the Real Twister "Gemma 4 Analyzed My Bank Statements – Apparently I 'Have a Problem' with Coffee and Late-Night Apps" #css #webdev #beginners #codenewbie The Hidden Layer Every AI Developer Must Learn AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent RDS Reserved Instance Pricing: Every Engine, Every Rule, Real Dollar Savings How To Build An AI-Powered MVP Without Burning Your Startup Budget In 2026 Reading a Psychrometric Chart Without Getting Lost LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) How to turn text into colors (without AI) Building Real-Time Apps in Node.js with Rivalis: WebSockets, Rooms, Actors, and a Binary Wire This Week In React #282 : Security, Fate, TanStack, Redux, Jotai | Hermes-node, Expo, Rozenite, Harness | TC39, Bun, pnpm, npm, Yarn, Node AI Copilot vs AI Agent Architecture - What's Actually Different (And Why It Matters) Smart Contract Security: NEAR's Futures Surge and AI Token Risks Database Maintenance: Tracing Production Incidents to Their Root Cause Stop juggling AI SDKs in PHP — meet Prisma Google Quietly Changed What “Apps” Mean at I/O 2026 The Infrastructure Team Is the Real Single Point of Failure Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team Your AI Has a Memory. It Just Doesn’t Know What to Remember. Claprec: Engineering Tradeoffs - Limited time vs. Perfection (6/6) Building a Daily Google News API Monitor in Python Building RookDuel Avikal: From Chess Steganography to Post-Quantum Archival Security Google I/O e IA: o que realmente muda na vida do dev? Color Contrast Failures: The Number One Accessibility Issue and How to Fix It # I Watched 15 Hours of Hermes Agent Videos So You Don't Have To Cómo solucionar el bucle infinito en useEffect con objetos y arrays en React The First Agent-Centric Cloud Security Platform — And Why We Didn't Build It That Way On Purpose Most Treasure Hunts Engines on Hytale Servers Are Built to Fail - Lessons from a Burned Database GhostScan v3.0 — From Closed-Source EXE to Open-Source Pentest Framework De hojas de cálculo a IA: construyendo una plataforma SRM moderna When is AI fine in education? Python Tools for Managing API Rate Limits in Data Pipelines How to Implement Exponential Backoff for Rate-Limited APIs in Python "My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline" next-advanced-sitemap v1.0.7 — safer URL ingestion & automatic trimming for Next.js sitemap generation I keep seeing people build an AI lead processing agent when they really need a 6-step rules engine AI Powered Student Learning Assistant Using Gemma 4 How I Built a Drop-In Proxy to Slash My OpenAI Bills by 20%+ Automatically Building a Sarcastic AI English Tutor with Persona-as-Code and Gemini Audio Input for Pronunciation Correction Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution AWS Savings Plan Buying Strategy: How to Layer, Size, and Time Commitments application.properties I built a macro tracker powered by AI + attitude Solace: A Global Mental Health First Responder Built with Gemma 4 Why Blocking Prompt Injection Is Wrong — and What to Do Instead The AI code tools Dutch developers actually use in 2026 (field notes) Automatic Error Recovery in AI Agent Networks You Are Not Choosing Building a Cinematic Adaptive Learning Intelligence with Gemma 4, Gemini, and OpenAI(Powered by Gemma 4) CLAUDE.md for Angular: 13 Rules That Make AI Write Idiomatic, Production-Ready Components I tested 7 vector databases for my RAG stack in 2026, here's the one nobody is talking about (yet) Claude agreed with a false fact I gave it. Confidently. That broke my workflow Google's "Budget" Model Just Beat Its Own Flagship. Here's What That Actually Means for Developers. How I built a monitoring SaaS for Joomla, WordPress & PrestaShop agencies Shifting from Passive Dashboards to Automated Remediation: A Guide to Next-Generation FinOps and CloudZero Alternatives Automating CSV WooCommerce Imports Without Plugins Why Wobbly Plugs and Overheating Outlets Are More Dangerous Than You Think (UL 498 Explained)
I Built an AI Receipt Scanner with Gemma 4 — As an SDET with No Dev Background
Hemalatha Na · 2026-05-19 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

My Honest Starting Point

I want to be upfront about something before diving in: I am an SDET — a Software Development Engineer in Test. My world is test automation, quality assurance, and finding bugs, not building apps from scratch. I have spent the last few weeks learning AI fundamentals, experimenting with different models, trying to understand how this whole ecosystem actually works beneath the surface.
When I came across this hackathon, my first instinct was to scroll past it. This is for developers. But then I thought — why not? The worst that happens is I learn something.
What followed was equal parts confusion, accidental discovery, and a working app I am genuinely proud of.

The Gemini vs Gemma Confusion (I Suspect I'm Not Alone)

My first stop was Google AI Studio. And honestly? It was fantastic for getting ideas off the ground quickly. I built a small app there, got a feel for prompt engineering, and started to understand how multimodal models work.
But there was a problem: every time I tried to use Gemma 4, Google AI Studio kept routing me to Gemini Flash Preview — the latest hosted model. No matter what I selected, it defaulted back to Gemini.
I spent an embarrassing amount of time thinking I was using Gemma 4 when I wasn't.
That confusion forced me to actually sit down and research the difference. And that is when it clicked:

Gemma is not a smaller Gemini. They share research lineage, but the deployment story is completely different. Once I understood that, everything else fell into place.

What I Built: ReceiptMind

ReceiptMind is an AI-powered receipt scanner that extracts structured data from receipt photos and builds an expense dashboard automatically.
You take a photo of a receipt — any receipt, any store — upload it, and Gemma 4 reads the image and returns:

  • Merchant name
  • Total amount
  • Date
  • Expense category (Food & Dining, Groceries, Transport, Healthcare, Entertainment)
  • Tax amount

No manual entry. No OCR pipeline. No template matching. Just Gemma 4 looking at the image and understanding it.
This started as a feature I wanted to add to a personal finance app I have been quietly building on the side. The hackathon gave me the deadline I needed to actually ship something.

Why Gemma 4 26B MoE — Not the Other Models

This is the question I care most about answering, because I made this choice deliberately.
The Gemma 4 family has four models:

I chose the 26B MoE (A4B) for two specific reasons:

  1. It is the only model in the family with native image input.
    ReceiptMind's entire value is reading receipt photos. Without multimodal vision, there is no product. The E2B and E4B are text-only. The 31B dense is text-only. Only the 26B MoE can receive an image and reason about what it sees.

  2. Despite 26B total parameters, only 4B activate per token.
    This is the Mixture-of-Experts efficiency. The model routes each token through only the most relevant expert layers — so I get near-31B quality visual reasoning at a fraction of the compute cost. For a hobby project running on a free API tier, this matters enormously.

I also used the 256K context window to pass multiple receipts in a single prompt when generating monthly spending insights — no chunking, no retrieval, just the full history in one shot.

The Tech Stack

Frontend → HTML + Vanilla JavaScript
Backend → Node.js + Express
AI → Gemma 4 26B MoE via OpenRouter (free tier)
Database → Neon Postgres (serverless)
File Upload → Multer (in-memory buffer → base64)

Why OpenRouter?
Google AI Studio kept routing me to Gemini Flash. OpenRouter gave me direct access to google/gemma-4-26b-a4b-it:free with no credit card and no routing surprises. Once I found it, the API worked on the first try.

How It Works — The Architecture

User uploads receipt image

Express backend receives file via multer

Image converted to base64

Sent to OpenRouter → Gemma 4 26B MoE (multimodal)

Gemma reads the image, returns structured JSON

JSON saved to Neon Postgres

Dashboard updates with new receipt + running totals

The Core API Call

Here is the exact call that makes ReceiptMind work. The key is the image_url content block — this is what tells Gemma 4 to look at the receipt image:

const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.OPENROUTER_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "google/gemma-4-26b-a4b-it:free",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "image_url",
            image_url: { url: `data:${mimeType};base64,${base64Image}` }
          },
          {
            type: "text",
            text: `Look at this receipt and extract the data. 
            Reply ONLY with JSON:
            {
              "merchant": "store name",
              "amount": 12.50,
              "date": "2026-05-14",
              "category": "Food & Dining",
              "tax": 1.10
            }`
          }
        ]
      }
    ]
  })
});

Enter fullscreen mode Exit fullscreen mode

No OCR library. No preprocessing. No regex parsing of receipt text. Gemma reads the image exactly like a human would and returns clean structured data.

Test Receipts — What Gemma 4 Had to Handle

I tested with 5 real-world receipt types, each designed to stress-test different extraction challenges:

The gas receipt was the toughest — 42.45L @ $1.649/L = $69.96. Gemma extracted both the unit price and total correctly without any hints.

The entertainment receipt had a discount applied before tax, which changes the subtotal calculation. Gemma handled it correctly.

What Broke (And How I Fixed It)

**Problem 1: **The request would hang indefinitely
The free-tier model on OpenRouter can be slow during peak hours. I added a 30-second AbortController timeout so the frontend shows a proper error instead of spinning forever.

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 30000);

Problem 2: Gemma sometimes wraps JSON in markdown code fences
The model would return

json { ... }

instead of raw JSON. Fixed with a one-liner:

const clean = rawText.replace(/json|/g, "").trim();

Problem 3: I had no idea what was failing
As an SDET, my instinct was to add logging everywhere. I added console.log checkpoints at every step (file received → base64 converted → API called → response received → JSON parsed → DB saved). This immediately showed me exactly where things were failing during development.

The Expense Dashboard

After scanning receipts, the dashboard shows:

  • Running total across all receipts
  • Breakdown by category (Food & Dining, Groceries, Transport, etc.)
  • Full receipt log with merchant, amount, date, and category
  • Per-receipt tax tracking (useful for expense reports)

After scanning all 5 test receipts, the dashboard showed a combined $369.70 across 5 categories — exactly matching the manual totals.

What This Means for My Pet Project

ReceiptMind started as one feature of a larger personal finance app I have been building. The plan is to integrate it so users can:

  1. Scan receipts throughout the month
  2. Get AI-generated spending summaries ("You spent 40% more on dining this month")
  3. Set budget alerts by category
  4. Export expense reports for tax season

The 256K context window is what makes the spending insight feature viable — I can pass the entire month's worth of receipts in one prompt and ask Gemma to reason across all of them at once.

What I Learned as an SDET Doing This

A few things surprised me:

1. Prompt engineering is just test case design.
Writing a good prompt felt exactly like writing a good test spec — be precise, cover the edge cases, define the expected output format. The skills transferred more than I expected.

2. The model choice matters more than I thought.
I initially assumed any capable model would work. But switching from text-only to multimodal was the difference between having a product and not having one.

3. The confusion between Gemini and Gemma is real.
If you are just getting started, burn this into your memory: Gemma = open weights you run yourself. Gemini = Google's hosted API. They are different products built from related research.

4. Ship something small and real.
I could have tried to build the full personal finance app. Instead I picked one feature, made it work end-to-end, and learned more in a week than I had in the previous month of reading documentation.

GitHub Repository
🔗 https://github.com/Hema-Nambi/ReceiptMind

Try It Yourself

Requirements:

  • Node.js 18+
  • Free OpenRouter account → openrouter.ai
  • Free Neon database → neon.tech
git clone https://github.com/Hema-Nambi/receiptmind
cd receiptmind
npm install
# Add your keys to .env
node server.js
# Open http://localhost:3000

Enter fullscreen mode Exit fullscreen mode

Built during the Gemma 4 Challenge — May 2026