惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Getting Data from Multiple Sources in Power BI: A Practical Guide to Modern Data Integration Google Is No Longer Just a Search Engine I built GemmaPod - A truly composable and portable AI agent solution powered by your local LLM How to build an AI-powered content moderation pipeline for user comments Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama AI Makes Building Cheap. Our Product Architectures Still Assume It’s Expensive. I built an in-browser Roku TV remote with ~80 lines of TypeScript. Here's how Roku's ECP API actually works The Direction of Blame babbled notes: a sound-to-music agent for people who could not make music before How I Built a Live SQL Workshop Where Students Can't Break Anything Rescuing a Stranded Protocol: Re-Skinning Legacy Code for the Trestle DeFi Flywheel SOLID Heuristics Reveal Incomplete Domain Knowledge — Nothing More AllasCode Intitute / FullAgenticStack: The Intent-Based Router Introducing LogicGrid — Multi-Agent AI Orchestration for .NET AI Prompt Injection, Drupal SQLi Exploitation, and Nmap for Hardening AI Agents & Python Workflows: Anthropic Skills, Jupyter Challenges, and Edge Deployment SQLite Optimization, PostgreSQL Async Queries, & DuckLake Dataframe Spec RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix Microsoft Burned Its 2026 AI Budget on Claude Code in Six Months. That's the Real Story. Why I Started Learning FastAPI in 2026 I Abandoned Ghost for Months — Then Came Back and Finally Finished It Building an Open MIT-Licensed Ephemeris Engine in C — JPL Moshier Ephemeris 4 Smart Ways to Manage Retries in Side Projects Securing Web APIs: A Practical Guide to Authentication & Authorization Methods Google I/O 2026: AI Built an OS in 12 Hours. I Spent Mine Sorting Screenshots. 🤦 Half a Day, Not a Week: One Nix Flake for Three Machines 🌱 Keep Feeding Your CI/CD — Or Watch It Die Gemma 4 vs GPT-4o vs Llama 3: What Actually Works Locally? Vessel Ops SSH in 2026: Why Every Developer Should Know It Cold Audit AI-Generated PRs Before You Merge Them (Swarm Orchestrator 10.3.0) App Store Optimization (ASO) I built a tool to visualize Django REST Framework architecture (URLs, Serializers, Models, and more) How I made my React site agent-ready in 100 lines AI Can Generate Interfaces on the Fly. But Users Still Need Orientation. AI-Assisted Content Workflow How We Learned That Most Resume Rejections Happen Before Humans See Your CV How I Prepared for CKA: Resources, Labs, and Strategy That Worked for Me Remix Mini PC: Moving the Whole Operating System Onto the eMMC Stop Flying Blind: We Built an LLM Evaluation Framework That Works Across 17+ Agent Frameworks The Misleading "User is not authorized to access connection" Error in AWS CodeBuild — and Why Your IAM Policy Looks Fine I Resurrected a Dead F1 Project and Accidentally Built a Race Intelligence OS Remix Mini PC: After a Year of Dead Ends, the eMMC Finally Talks Not All Games Are Equal: The Real Difference Between a Trap and a Tool How to add Peppol e-invoicing to your SaaS without making it your team's problem I Built a Hermes Agent to Tell Me Which Hackathons to Enter. It Told Me to Enter This One. The Five Hooks That Change How You Ship With Claude Code Powering Your Progress: Building Robust Solutions with Laravel I built a self-hosted CI/CD platform with persistent queue, encrypted secrets, and rollback UI — here's what I learned Antigravity 2.0 and the $1,000 OS: Why "Agent-First" Feels Like the Direction I've Been Building Toward Anyway I built an AI PR-triage agent in 30 lines of Markdown Core Web Vitals from 74 to 91: A Real Tax Practitioner Site Rebuild I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened. Beyond the Loop: Why Monolithic AI Agents Fail and How to Build a Microkernel Architecture The Hidden Tax of AI-Assisted Development (And How I Fixed It) I Ditched Cloud LLMs for Gemma 4 4B: A DevOps Engineer's 48-Hour Reality Check Building a Schema.org @graph That Validates on the First Try The "Lift and Shift" Trap: Why Your Integration Layer Needs More Than Just a Cloud Address All 7 OSI Layers Explained with Real-World Analogies Antigravity 2.0 in one day: the four shells and what each is good for Self-Hosting Google Fonts with size-adjust: Zero CLS Web Font Swap The Multi-Provider LLM Problem: Why “One API” Is Not Enough How I indexed 69,000 Claude Code skills (and what I learned doing it) RememberMe CareGrid: Local Gemma 4 for dementia memory and safety Google Is Killing Gemini CLI on June 18. Here Is What to Do Before Then Do Domínio ao Deploy: Hospedando Arquivos de Deep Links no Cloudflare Pages (Parte 7.1) Running Gemma 4 26B on an Old GTX 1080 with llama.cpp Devlog 1: I tried building an SNES game with the super FX chip Why Gemma 4 Feels Like an Important Moment for AI Developers✨ From Zero and Confused, This Is How I Started Learning to Code I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini — Without a Single API Key Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founder’s Best Friend MyErp Architecture Series - #02 Cellular Architecture: Mapping Biology to Software Systems NodeJS vs Bun vs Go 🌍 RTL Arabic Style UI How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bono’s Six Thinking Hats on Local LLMs Using Gemma 4 📝 Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI — Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders 터미널 AI 에이전트 구축 (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Don’t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference — and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams
Gemma 4 E4B caught three planted fabrications in 50 seconds — on a laptop, no cloud
Arthur · 2026-05-25 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

scribe-check is a local-first command-line tool that reads a Markdown article and a folder of source documents, and reports every concrete claim in the article that isn't corroborated by the sources you handed it. It checks five categories of fabrication risk: quoted strings that drifted a word, named entities the sources never mention (a coauthor that shouldn't be on a paper), numeric specifics that don't match (off-by-2× rod-cell counts), italicized terminology that drifted (the article italicizes X where the source italicizes Y), orthographic drift (British spelling leaking into a US-English piece, or vice-versa), and temporal-marker leaks (today, this morning, weekday names sneaking into evergreen prose).

It's the kind of pass an editor would do on every draft, if every writer had an editor on every draft. Instead, it runs on Gemma 4 E4B via Ollama. Locally. On a laptop. In about a minute on a ~2,000-word article.

I built it because I'd been doing this review by hand on my own articles, assembling a citations.md file and scanning the article line by line against the citations. It's exactly the kind of repetitive, structural check a small local model can do consistently and cheaply.

Demo

Three planted fabrications in a real published article: a drifted italicized term (*simple cells**elementary cells*), a fake coauthor (Ahmed, Natarajan, Rao, and Petrova), and a doubled count (120 million rod cells240 million rod cells). scribe-check catches all three on a single pass against the article's citations.md. The CLI shows a live spinner with elapsed seconds on stderr during the ~50-second model call (auto-suppressed when piped), so the wait never feels hung:

(raw transcript and JSON live in examples/transcript-fabrications.txt and examples/output-fabrications.json.)

⚑ scribe-check: 5 finding(s)

QUOTES FLAGGED  (1)
  1. *elementary cells*
     at: They discovered that individual neurons in the primary visual cortex, the structures they later called *elementary cells…
     concern: The article italicizes *elementary cells*, but the source uses the term *simple cells* when describing the structures Hubel and Wiesel found. This is terminology drift.
     closest: structures they later called *simple cells*, fired most strongly in response to oriented bars and edges at specific spatial frequencies

NAMES FLAGGED  (1)
  1. Petrova
     concern: The article claims the DCT was introduced by Ahmed, Natarajan, Rao, and Petrova. The source only lists Ahmed, Natarajan, and Rao as the authors of the 1974 paper. 'Petrova' is a fabricated coauthor.

SPECIFICS FLAGGED  (3)
  1. The human eye contains roughly 240 million rod cells
     concern: The source provides a canonical figure of 'roughly 120 million rod cells' (Claim 7). The article's figure of 240 million is twice the value provided in the source.
  2. The human eye contains roughly six million cone cells
     concern: The source provides a canonical figure of 'roughly six million cone cells' (Claim 7). This specific claim is corroborated, but the context of the 240 million rod cells makes the overall claim suspect.
  3. The DCT decomposes the block into sixty-four spatial-frequency components
     concern: The source confirms the block size (8x8) and the resulting number of coefficients (64), but the article's phrasing is slightly redundant and less precise than the source's description of the process.

Enter fullscreen mode Exit fullscreen mode

Code

github.com/arthurpro/scribe-check

The whole thing is ~500 lines of Go split across six files:

  • main.go: CLI, flag parsing, dispatch
  • loader.go: article + sources loader, token estimation
  • prompt.go: system prompt + per-call user prompt
  • ollama.go: HTTP client for /api/chat with structured-JSON output and one-shot retry on malformed JSON
  • render.go: color-coded terminal table
  • spinner.go: stderr progress spinner with elapsed timer, auto-suppressed when stderr isn't a TTY

Single dependency: stdlib. No vendored model, no embeddings, no RAG. The whole article and all sources go into one Ollama call.

How I Used Gemma 4

I chose Gemma 4 E4B (the "effective 4B" edge variant, ~9.6 GB on disk at Q4_K_M, served as gemma4:latest on Ollama) because the job needs three things simultaneously and only E4B has all three:

  • Structural reasoning that the E2B (2B effective) variant doesn't reliably produce. Catching *elementary cells* as drift from *simple cells* requires comparing terminology across the article and the source, not just spotting a wrong word. The smaller variant over-flagged or under-flagged inconsistently in my tests. E4B handled this reliably across multiple runs with temperature=0.1 and a fixed seed.
  • 128K context. The whole article (~2,100 words) plus the citations file (~25 verified claims with notes) plus the system prompt fits in ~6.5K tokens, comfortably inside the window. For larger source sets scribe-check auto-sizes num_ctx up to the full 131072 without re-architecting. No RAG, no chunking, no embedding store.
  • Local execution. This tool runs between drafts. If it cost a cloud API call every time, I'd skip it half the time. Free + ~50s per pass on consumer hardware is the cadence at which I actually use it.

I tried the same workload mentally against the 26B MoE and 31B dense variants. They would be sharper, but at 5–10× the latency, I'd be tempted to batch the pass to "once before publish" instead of running it on every revision. The whole point of putting the model in the writer's loop is to make the check cheap enough that it always runs. E4B sits at that intersection.

What I learned about prompting an E4B model

One real engineering discovery worth flagging for anyone else building on E4B: the prompt design is the entire product. My first prompt ("find every concrete claim in the article that isn't corroborated by the sources") caught zero of three planted fabrications. The model agreed with the article because it sounded plausible against its own world knowledge.

Adding an explicit "ignore your own world knowledge; check only against the SOURCES block" rule moved the catch rate to 1/3. Adding short positive examples of the pattern (Petrova → flag this; *elementary cells* vs *simple cells* → flag this) moved it to 3/3.

The cost is precision. On a clean article, the same prompt over-flags 5–7 borderline items: derived ratios, soft-language paraphrases, slightly-rephrased corroborated claims. A human dismisses these in seconds while skimming, and the cost of that skim is much cheaper than the cost of a missed real fabrication. That's the design trade-off scribe-check makes deliberately: high recall, modest precision.

If you're building anything fact-checking-shaped on a small local model, lean into recall. Trust the human to filter.