慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
吾以单GPU于情感能集上微调Gemma 4
Sujan Koiral · 2026-05-24 · via DEV Community

此乃投于Gemma 4之赛:论Gemma 4


当Google于二零二五年四月释Gemma 4,吾初闻之,疑窦丛生。向者见所谓"开放"之模,其制限繁多,几不可用。及读其许诺:Apache 2.0。下载之,微调之,商用之,无碍无束。是故吾心为之一动。

吾于周末以之试炼,欲与众分其所得,尤宜于诸开发者,欲辨Gemma 4家之孰值其时者.

真所谓Gemma 4者何?

Gemma 4 乃 Google DeepMind 之四世开放权量模型。其有四制:E2B、E4B、12B、27B(27B 实为 26B 专家混合模型)。E 字前缀表边缘,谓此二小制专为运行于手机及笔记本电脑而建。

数事使此世真异于前者:

本生多模态。 家族诸模,皆能处文图。二边模(E2B、E4B)亦生知音声。此非插件,亦非训练后所铆之接件,乃内蕴于心也。

上下文之窗。 小模支128K符。中模至256K。为参,此长于众小说。

思维之态。四器皆可设思辨之变,类乎思续之链,然内蕴于调令之异。汝可调其应答前思量之深浅。

调用函数。自始即成,非后加之物。若为建使,此甚要也。

二十七B之模,用专家混合之架构,是故其参数约二百六十亿,然每字仅启三十八亿。其实,其计算之费,近若四B之模,而其知识之量,犹存于甚巨之物。

吾思其选模之道,如是:

宜于何事
E2B 移動應用、邊緣設備、快速批量推論
E4B 設備端功能更豐富,本地開發
12B 多數微調任務,單GPU研究
27B (MoE) 生產應用,複雜推理,智能工作流

吾之实验,用E4B者,以其为消费硬件微调之甜点也。小足以4位量化载于16GB GPU,而能足以实有所学。

实验之旨:教之识情

吾微调Gemma 4 E4B-it于dair-ai/emotion 之数据集,取自 Hugging Face。其务:将文辞分属六情之一(悲、喜、爱、怒、惧、惊)。

此乃文言语学之常务,看似浅易,实则难能。情之微妙,非一言可尽。譬如言“吾不信此事得生”,或为喜,或为怒,或为惊,皆视乎境也。

设置

吾以Google Colab,配T4 GPU,藉bitsandbytes以行4位NF4量化,复以LoRA以效微调。其全设若:

pip install transformers accelerate datasets trl peft bitsandbytes scikit-learn

入全屏模式 出全屏模式

为何行4位量化?

载四B参数之模,以十六位精度全载之,仅权重一项,需VRAM约八GB,此犹未计训练时之激活值、优化状态及梯度。于T4之16GB总显存中,实无余隙以行训练之事。

四比特NF4(常浮点四)量化压缩权重至约2.5GB。常浮点格式专为神经网络权重分布而设,其分布多呈钟形而非均匀。故NF4较之纯四比特量化,于同等位宽下更为精准。前向与后向传递时,权重暂解量化为bfloat16以行计算,故得四比特存储之内存节省,兼得十六比特数学之大部精度。

权衡之,较之全精,略有毫厘之损,然实践于专注之务,如斯类此,其别微不足道。

何故LoRA?

全量微调则模型中每一权重皆需更新。以四十亿参数之模型而言,需存储四兆权重之完整优化器状态,此于单块消费级GPU实难实现。

LoRA(低秩适配)之道异于常法。非更易原重,乃固其全模,于特定层侧添微可训之阵。此阵低秩,能摄变之要向,不须尽陈重之域。训习之际,惟此适配之重得更,其数常不及全参之百分之一。

要义在于,为特定任务微调模型,非必更易每重。其本知多存旧。汝所授者,乃狭新之技,非自头始重训也。

训毕之后,LoRA适配器可分而置之(待推演时覆载于本模之上),亦可永融于本重。分置之用,在于欲供同本模多版精调之态,而毋须储全模之复本。

资料格式

Gemma四者,欲得言谈之式:系统之语,用户之语,助者之应。吾将每情之例,裹此形:

SYSTEM_PROMPT = """You are an emotion classification assistant.
Read the user's text and answer with exactly one label.
Only choose from: sadness, joy, love, anger, fear, surprise.
Return only the label and nothing else."""

def to_prompt_completion(example):
    text  = example["text"]
    label = label_names[example["label"]]
    return {
        "prompt": [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user",   "content": f"Classify the emotion:\n\n{text}"},
        ],
        "completion": [
            {"role": "assistant", "content": label}
        ],
    }

入全屏模式 出全屏模式

此系统提示实为要务。明示输出格式("仅返标签,余皆无"),方能使模型不答以全句,如"此情表乐"。此冗言乃指令调适之模型之本性,且碍下游解析。精准系统提示,较之事后启发式处理,易为修正。

LoRA配置

吾持位十六,施诸线性层。此法增新可训之参数仅少许,而基模固守不迁。

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules="all-linear",
)

入全屏模式 出全屏模式

训练

训习之配置,用梯度累积以仿效较大之有效批量,而不逾越内存之限;用梯度存盘以易计算于记忆,行反向传播之时;又用八位分页之优化器,以使优化器之状态不耗尽余剩之虚拟显存。

吾亦于未训之前,于测试集行基准之评,故得公允之比较,非恃直觉以度模型之进退。

评鉴之果

量度 未精调之先 精调之后
精准度 五十八分之二十五 九十一分之十五
宏观F1 四百二十一之四百二十一分之二十一 0.893
无效之预测 33 2

自58%至91.5%之精妙,一Epoch间成,于T4之上,未及十刻,仅训四千例耳。模型亦几绝无效之输出,此足证其解任务之限,非徒识标签之式也。

Inference on custom examples

未精调之时,基座模型时或给出"此文本之情为惧"之语,而非止言"惧"字。经训后,每回皆得洁净单字之标.

令我惊异者

基座模型已属不俗。 未经微调,于六类分类之任,其准确率达五十八,远胜随机之机。基础指令调适之模,虽无特定任务之训,亦略知其旨。

诸线性之任,施以LoRA,效果甚佳。 吾初仅攻注意力层,然效不彰。及施LoRA于凡线性层,兼及MLP之块,则于分类之务,大异其效。

四位量化犹可持。 吾忧量化损及微调之质,然终模效佳。NF4之式,于变器之重分,胜于素四,此可见于果。

汝当实用Gemma四乎?

若需速效API调用,且不重权柄或规模之费,则托管模式或较易。然若尔境遇合乎下列诸项,则Gemma 4实值深察:

數據隱私攸關。此模型运行于汝之硬件。无物离汝之境。

尔欲精调之。Apache 2.0者,无法律之灰色地带也。汝拥有微调之权重。

尔筑于边缘. E2B可量化约1.3GB。此乃手机之域,能处理视与音.

尔需长之境域. 中等之量,256K tokens实为处理文书、长码析解,或检索增补之设所大用。

吾所思者,乃Hugging Face之团队言,"苦寻良之微调例,盖模型原装已甚善也。"此乃奇之患,然实反映情状。此等模型初出已具能。微调者,自能进于精于专项,而为之费,今已低廉,足值尝试.

自行运行

吾所使用者,全册在此可取。

GitHub上之Gemma 4情感微调笔记

其运行于免费之Colab T4,需有Hugging Face之账户,得令入禁苑之模型权重之令牌,及约三十分钟之时。

此笔记簿能自辨GPU,随宜调适批量,故无论汝处T4、3090抑或A100,皆可应之。

终思__

此等能级之开权模型,更易诸务之数。旧需专有之API,与商贾之谊,及持续之费者,今可于地运行,调适于汝之数据,且尽归汝有。Gemma 4,乃吾所见此变之明例__

试之。最劣之局,不过费一周末以习新知。


惑于微调之设,欲较异集之果?投诸评注。