慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
吾于家之实验室,遍试Gemma 4之诸模。E4B胜E2B。此为数据。
Shane Castil · 2026-05-24 · via DEV Community

此乃投献于Gemma四挑战:论Gemma四


谷歌推出四款Gemma 4变体。众人皆以合成基准测试之,然此等测试,实无人真正关切。吾已试运行四款。吾家实验室之硬件实务之务余心为之惊。

測試機器: Ryzen 7 5700X, RTX 1060 6GB, 32GB RAM. LM Studio, 4-bit quantization.


模型

模型 有效参数 4-bit大小 架构
E2B ~2.3B 1.5GB 密集
E4B ~4.5B 2.1GB 稠密
26B 模块化专家 ~4B 激活 / 26B 总量 13GB 专家混合
31B ~31B 16GB 稠密

测试一:视觉——书脊阅读

以相机对准书架。能否辨识书名?

模型 时间 书得 至善
E2B 八十三秒 — 返"无" 不可识书脊
E4B 廿五秒 六题,确然识之 ✅ 可靠
二十六B模因 内存溢出于12GB 不合适
三一B 内存溢出于12GB 不协

此乃全事也。多模态之务,E2B者,非也E4B之小版也——其能实远逊,乃根本不足之视模型。竟不能识一书之脊。E4B得六。

若以图像为要,则E2B非所之选,固也。


测试二:文—技术之释

"以三言释TCP与UDP之异。"

模型 答之质
E2B 九十三秒 二百五十六(至限) 2.8 t/s 中庸——漫衍
E4B 20s 113 5.7 t/s 简明精准

E4B则4.6倍迅疾且以更少之符文得佳答。此反"小者速"之想——E4B之思辨更效,故速毕之。


测试三:结构化输出—JSON生成

"返回一个包含10种编程语言及其创建年份与创造者的JSON数组。"

模型 有效JSON? 字段正确? 时间
E2B ✅ 是 ❌ 3/10年份错误 45秒
E4B ✅ 是 ✅ 皆正确 12秒

E2B虚幻之创期。E4B尽得无误.


测试四:视觉推理书架系统

真试也。运行吾之书架系统——自图像识书籍,增益元数据,生成推荐.

模型 检测 增益之谓也 全也 可行乎?
E2B 未得书册
E4B 十六卷,百六十六刻 两批,二百八十秒 约八分辰
二六B/三一B 内存溢出

惟E4B能成全于民用之器。八分钟成全一整架之书目,虽非立时,然费无钱,且存于地。


记忆之障

“运行于民用之器”者,实谓吾RTX 1060 6GB诸模之用:

所需虚拟内存(4位) 可容12GB乎? 何须言境?
E2B 一又五分之壹千五百兆字节 ✅ 肯定 天地辽阔
E4B 二千一百兆字节 ✅ 是也 绰绰有余
二十六B模因 ~13千兆字节 ❌ 无
31B ~16GB ❌ 无

二大模组实不配于3200级GPU。欲用31B,须3090(24GB)为下限,纵然如此,亦仅余微弱之情境窗。

参考而言,31B密实模组需多耗VRAM约800MB。每百万个符号之语境。彼二十四GB 3090者?可容模型并或三十K语境。非所宣之二百五十六K.


我所愿之决策树

依序自问诸此:

一。需处理图像乎?

  • 是则E4B为最低。E2B之视,不可用也.
  • 非则续问二。

可容于六吉字节VRAM乎?

  • 然则E4B四位(约二点一吉字节)犹有余地以容境。
  • 不然则E2B或需更巨之GPU。

3. 此乃一次性之务抑或反复之劳?

  • 一次性则云API(OpenRouter免费层有E4B)。
  • 反复则本地E4B。无每标记之费。

4. 尔需至极之推理质乎?

  • 是也 → 密三十有一,然需二十四格以上之VRAM.
  • 非也 → 四十四格已足。吾实不能辨其异于书脊之识.

残酷之实

二格者,市井之辞也。"行于尔之机!"噫,然不能识书脊。二格与四格于多模之务,非渐进之别——乃"可作"与"不可作"之异也。

E4B者,使地之智工实有用也。容于3060,视事可恃,生结构之出,且于E2B,盖因思理更效也。

26B MoE与31B者,为有伺器GPU之人设也。若持4090或A100,则诚非凡。若惟有博弈之GPU,则如废纸耳。

吾择E4B以供Shelfie,实为得计。十六卷书,元数据备,个性之荐,皆于吾家之实验室,无偿运行。

E4B乃Gemma 4家族之隐功臣。格物之验,非此不能道。实用方显其能。


试之Shelfie:github.com/scastile/shelfie