惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V2EX - 技术
V2EX - 技术
L
LangChain Blog
IT之家
IT之家
S
SegmentFault 最新的问题
博客园 - 三生石上(FineUI控件)
H
Hackread – Cybersecurity News, Data Breaches, AI and More
T
The Blog of Author Tim Ferriss
Blog — PlanetScale
Blog — PlanetScale
N
Netflix TechBlog - Medium
U
Unit 42
B
Blog RSS Feed
GbyAI
GbyAI
Microsoft Security Blog
Microsoft Security Blog
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
T
Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
The Register - Security
The Register - Security
Vercel News
Vercel News
S
Schneier on Security
Spread Privacy
Spread Privacy
C
Cyber Attacks, Cyber Crime and Cyber Security
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
博客园 - 叶小钗
雷峰网
雷峰网
博客园_首页
人人都是产品经理
人人都是产品经理
P
Palo Alto Networks Blog
The Hacker News
The Hacker News
T
Tor Project blog
L
Lohrmann on Cybersecurity
Know Your Adversary
Know Your Adversary
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
Cybersecurity and Infrastructure Security Agency CISA
P
Privacy International News Feed
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tenable Blog
V
Vulnerabilities – Threatpost
大猫的无限游戏
大猫的无限游戏
博客园 - 【当耐特】
V
V2EX
Security Latest
Security Latest
A
About on SuperTechFans
Cloudbric
Cloudbric
S
Security Affairs
MongoDB | Blog
MongoDB | Blog
Y
Y Combinator Blog
Martin Fowler
Martin Fowler
TaoSecurity Blog
TaoSecurity Blog

Sealos Blog

Build a Full-Stack App with Claude Code + InsForge — Zero Backend Code | Sealos Blog InsForge vs Supabase: Which Backend for AI-Powered Development? | Sealos Blog Kubernetes NodePort Exhaustion: SSH Gateway Solution | Sealos Blog Claude Code Metrics Dashboard: Grafana Setup (2026) | Sealos Blog What Is RustFS? Apache 2.0 MinIO Alternative (2026) | Sealos Blog Claude Code Mobile: iPhone, Android & SSH (2026) | Sealos Blog Eaglercraft Server Hosting: Fast Setup (2026) | Sealos Blog An Honest Review: Migrating a Complex Microservice App from Heroku to Sealos | Sealos Blog The Ultimate Guide to Kubernetes Audit Logging for Security and Compliance | Sealos Blog Cost Optimization Shootout: Sealos Autonomous FinOps vs. Kubecost Manual Reports | Sealos Blog For CTOs: How to Cut Your Cloud Bill by 50% Without Sacrificing Performance | Sealos Blog Building Resilient Systems: A Deep Dive into Sealos High-Availability and Auto-Failover | Sealos Blog Building a Scalable Event-Driven Architecture with Sealos Managed Kafka | Sealos Blog Beyond kubectl apply: 5 GitOps Best Practices for Production-Ready CI/CD on Sealos | Sealos Blog Advanced RAG Pipelines: Why Your Choice of Vector Database (like Milvus) Matters | Sealos Blog Advanced MLOps: How to Monitor and Evaluate LLM Applications in Production | Sealos Blog A Developer's Guide to Kubernetes RBAC: Securing Your Cluster the Easy Way with Sealos | Sealos Blog A CISO's Guide to Cloud Development: Securing the CI/CD Pipeline with Sealos DevBox | Sealos Blog What is Kubernetes Multi-Tenancy? A Guide for Platform Engineers | Sealos Blog What is Infrastructure from Code (IfC)? The Next Step After Infrastructure as Code (IaC) | Sealos Blog What is GitOps? A Beginner's Guide to "Push-to-Deploy" Workflows | Sealos Blog What is eBPF? The Future of Kubernetes Networking and Security | Sealos Blog What is an "AI-Native" Platform? (And Why You Need One for MLOps) | Sealos Blog What is an Agentic Workflow? Building the Next Generation of AI Apps | Sealos Blog What is a Kubernetes Chargeback Model (And How Does it Save You Money?) | Sealos Blog What is a "Headless" Development Environment? (And How it Works with VS Code) | Sealos Blog What is a Graph-Based Vector Database? (And When to Use It Over Milvus) | Sealos Blog What is a "Cloud Operating System"? The Next Evolution of PaaS Explained | Sealos Blog The Real Cost of EKS: How Sealos Delivers a Simpler, Cheaper Kubernetes Experience | Sealos Blog The 3 Types of Kubernetes Autoscaling (HPA, VPA, CA) and How Sealos Manages Them for You | Sealos Blog Sealos vs Vercel: Why a Cloud OS Beats a Frontend Platform for Full-Stack Apps | Sealos Blog Sealos vs. Render vs. Fly.io: A 2025 Guide to the Best Heroku Alternatives | Sealos Blog Sealos vs. OpenShift: Kubernetes for Developers vs. Kubernetes for Ops Teams | Sealos Blog Sealos vs. Netlify: When to Choose a Full Kubernetes Platform over a Static Site Hoster | Sealos Blog Sealos vs. DigitalOcean App Platform: A Head-to-Head Comparison on Cost, Features, and Scalability | Sealos Blog Sealos vs. AWS Elastic Beanstalk: The Modern PaaS for Developers Who Hate YAML | Sealos Blog Sealos DevBox vs. AWS Cloud9: Why Your CDE Should Be Platform-Agnostic | Sealos Blog For Developers: Stop Wasting Time on DevOps. A 10-Minute Guide to Shipping Faster with DevBox. | Sealos Blog Deploying n8n with Docker: From Local Setups to a Radically Simple Cloud Alternative | Sealos Blog The Impact of Prompt Bloat: How the Sealos AI Proxy Can Cache Queries and Cut LLM Costs | Sealos Blog The FinOps Playbook: How to Implement Kubernetes Chargebacks and Showbacks with Sealos | Sealos Blog Smoke Testing for ML Pipelines: Catching Data and Model Errors Before They Hit Production | Sealos Blog Optimizing PostgreSQL Performance: A Guide to Sealos Managed Database Tuning | Sealos Blog Managing Kubernetes Multi-Tenancy: How Sealos Enforces Resource Quotas and Network Policies | Sealos Blog From Days to Minutes: How to Standardize Developer Environments for Your Entire Engineering Org | Sealos Blog For Platform Engineers: How to Build a Golden Path IDP (Internal Developer Platform) with Sealos | Sealos Blog For FinOps Managers: The 5 Leakiest Buckets in Your Kubernetes Budget (And How to Plug Them) | Sealos Blog For Educators & IT Admins: How to Provide a Secure, Scalable Cloud Lab for 1000+ Students on a Budget | Sealos Blog What is a Vector Database? A Beginner's Guide to Milvus, Pinecone, and More | Sealos Blog Why Your Microservices Architecture is Failing (And How a Cloud OS Can Fix It) | Sealos Blog The Power of Autoscaling: A Deep Dive into HPA, VPA, and Cluster Autoscaler | Sealos Blog The Total Economic Impact of Cloud Development Environments (CDEs) | Sealos Blog The Illustrated Guide to the Kubernetes Control Plane | Sealos Blog The MLOps Lifecycle Explained: From Data Prep to Model Deployment | Sealos Blog Beyond Vercel's AI Cloud: The Case for an AI-Native Operating System | Sealos Blog The Architecture of a Modern AI Application: A 2025 Blueprint | Sealos Blog GitHub Codespaces is Great, But Your Workflow is Incomplete. Here's Why. | Sealos Blog The Best Heroku Alternatives in 2025 for Scalability and Cost | Sealos Blog CAST AI vs. Kubecost vs. Sealos: Choosing the Right K8s Cost Management Tool | Sealos Blog DevBox vs. Gitpod vs. Replit: An Unbiased Comparison for 2025 | Sealos Blog Unlocking Hidden Savings: A Guide to Using Spot Instances Safely in Kubernetes | Sealos Blog Can a CDE Really Replace Your MacBook Pro? A Performance Benchmark | Sealos Blog The End of "Works on My Machine": Achieving 100% Reproducible Builds with DevBox | Sealos Blog The Ultimate Guide to GPU Provisioning and Management in Kubernetes | Sealos Blog Rightsizing Kubernetes Workloads: How to Stop Wasting Money on CPU and Memory Requests | Sealos Blog The 2025 Guide to Kubernetes Cost Optimization: 10 Strategies to Cut Your Bill in Half | Sealos Blog FinOps for Startups: How to Build a Cost-Conscious Culture from Day One | Sealos Blog How to Onboard a New Developer in Under 5 Minutes with Sealos DevBox | Sealos Blog Calculating Kubernetes Costs: A Breakdown of EKS, GKE, and AKS Pricing Models | Sealos Blog Case Study: How We Reduced Our Kubernetes Bill by 87% with Sealos | Sealos Blog Are You Overpaying for Managed Kubernetes? The True Cost of Vendor Lock-in | Sealos Blog Beyond Monitoring: How Sealos Autonomously Optimizes Your Cloud Spend | Sealos Blog A Practical Guide to Kubernetes Security: Hardening Your Cluster in 2025 | Sealos Blog A Secure-by-Design Development Workflow with Isolated Cloud Environments | Sealos Blog Setting Up a Collaborative Python Data Science Environment with DevBox | Sealos Blog Using the Sealos AI Proxy to Manage and Cache LLM API Calls | Sealos Blog Migration Guide: Moving Your Node.js & Postgres App from Heroku to Sealos in Under an Hour | Sealos Blog Serving Machine Learning Models at Scale: A Guide to Inference Optimization | Sealos Blog Headless Development with Sealos: Using Your Local VS Code with a Powerful Cloud Backend | Sealos Blog How to Build and Deploy a RAG Pipeline with Llama 3 and Milvus on Sealos | Sealos Blog From Localhost to Production in 15 Minutes: A Full-Stack CDE Workflow with Sealos DevBox | Sealos Blog GitOps on Autopilot: Implementing a CI/CD Pipeline with Sealos and GitHub Actions | Sealos Blog From Docker Compose to Kubernetes: A Simple Migration Path with Sealos | Sealos Blog Building an AI Agentic Workflow with LangChain and Sealos | Sealos Blog What is Helm for Kubernetes? The Ultimate Package Manager Explained | Sealos Blog What is a Custom Resource Definition (CRD) in Kubernetes? | Sealos Blog What is a Kubernetes StatefulSet? A Practical Guide | Sealos Blog What is a Kubernetes Ingress Controller? A Guide to Smart Traffic Routing | Sealos Blog What is a Kubernetes Operator? Automating Complex Applications | Sealos Blog What is a Kubernetes Service? A Simple Guide for Developers | Sealos Blog Streamlining Your CI/CD Pipeline with a DevBox Build Environment | Sealos Blog Why Standardized Development Environments Are Key to Team Velocity | Sealos Blog What Is GitHub Codespace? | Sealos Blog DevBox Install? Skip It Entirely. Get a Ready-to-Code Environment in One Click with Sealos DevBox. | Sealos Blog How to Set Up a DevBox: The Ultimate Guide to 1-Click Cloud Development | Sealos Blog Empowering Indie Devs and Startup Teams: How Sealos DevBox Accelerates Agile Development | Sealos Blog From Chaos to Consistency: How Sealos DevBox Transforms Enterprise Development Workflows | Sealos Blog From Campus Labs to Cloud Freedom: How Sealos DevBox Supercharges Student Development | Sealos Blog How Sealos DevBox Cut Container Commit Time from 15 Minutes to 1 Second | Sealos Blog DevBox vs Codespaces: Which Remote Dev Environment Fits You Best? | Sealos Blog
Fine-Tuning Open-Source LLMs on a Budget with Sealos | Sealos Blog
Sealos · 2025-09-02 · via Sealos Blog

You don’t need a cluster of H100s to teach a large language model to speak your business’s language. With the right techniques and a thoughtful workflow, you can fine-tune open-source LLMs for domain-specific tasks—customer support, legal drafting, knowledge retrieval, analytics—at a fraction of the cost.

In this guide, you’ll learn how to fine-tune open-source LLMs on a budget using parameter-efficient methods such as LoRA and QLoRA, with a cloud-native setup powered by Sealos. We’ll cover what fine-tuning is, why it matters, how to structure an efficient training pipeline, and how to deploy your model quickly. You’ll also see code snippets and Kubernetes examples you can copy, adapt, and run.

If you already use Kubernetes or want a configurable, developer-friendly platform to run AI workloads, Sealos (https://sealos.io) provides a streamlined way to provision GPU-backed environments, object storage, and apps like notebooks or web UIs so you can focus on your model—not on plumbing.

Fine-tuning adapts a base LLM to a specific domain or task by continuing training on your curated dataset. You can:

  • Supervised fine-tune (SFT): Teach the model to produce desired outputs from instructions or prompts.
  • Preference optimize (DPO/RLHF): Align with human preferences after SFT.
  • Task-specialize: Summarization, code generation, SQL generation, classification, etc.

Full fine-tuning updates all model weights and can be expensive. Parameter-efficient fine-tuning (PEFT) like LoRA and QLoRA updates only small “adapter” matrices, dramatically reducing compute, memory, and cost.

  • Lower TCO and faster iteration: Train in hours on a single mid-range GPU instead of days on a cluster.
  • Data-centric wins: High-quality, task-aligned data beats brute force compute.
  • Deployment simplicity: Small adapter files are easy to store, version, and swap.
  • Compliance and control: Keep models and data in your environment; open-source models let you control licensing and security posture.

Sealos is an open-source cloud platform that makes running Kubernetes-backed apps and workloads simpler. For LLM fine-tuning, you can use Sealos to:

  • Provision GPU-backed compute as containerized jobs or long-running services.
  • Set up object storage (S3-compatible) for datasets, checkpoints, and artifacts.
  • Launch developer tooling (e.g., JupyterLab, VS Code Web) to iterate quickly.
  • Host model inference endpoints with tools like vLLM or Open WebUI.

Whether you deploy Sealos on your own infrastructure or use a managed Sealos-based cloud, you get a consistent app-centric experience with Kubernetes robustness underneath.

Learn more at https://sealos.io.

  • Choose a compact, permissively licensed base model (e.g., 3B–8B parameters).
  • Use LoRA/QLoRA to update only 0.1–1% of the parameters.
  • Quantize weights to 4-bit during training (QLoRA) to fit into a single 16–24 GB GPU.
  • Keep sequence lengths and dataset size sane; prioritize quality over quantity.
  • Use gradient accumulation, mixed precision, and gradient checkpointing to stay within memory limits.
  • Track and resume training; autosave checkpoints to S3 storage.
  • Serve with an efficient runtime (e.g., vLLM) and scale only when needed.

Match model capacity to your task and budget:

  • 3B–8B class: Good for classification, summarization, structured generation, and many assistive tasks. Examples: Mistral 7B, Llama 3 8B, Mixtral-instruct (Mixture-of-Experts requires more memory to serve).
  • 13B+ class: For higher fluency or complex reasoning; cost will increase significantly.

Important notes:

  • License: Some models require acceptance of a license (e.g., Llama 3). Ensure your use case complies.
  • Context length: If you need long context (8k–32k), pick a base model that supports it.
  • Tokenizer: Keep compatibility between tokenizer and base model.

Compute an order-of-magnitude estimate:

  • Total tokens processed ≈ num_examples × avg_tokens_per_example × epochs.
  • Training throughput on 7B QLoRA: roughly 100–300 tokens/sec on a single L4/A10G (varies widely).
  • Hours ≈ total_tokens / throughput / 3600.

Example:

  • 20,000 examples, 300 tokens each, 2 epochs → 12M tokens.
  • At 200 tokens/sec → 60,000 sec (~16.7 hours).
  • If GPU costs $0.60–$1.20/hour, expect $10–$20 in compute.

These are rough numbers; your actual throughput depends on sequence length, batch size, model, and hardware.

A minimal, budget-conscious setup:

  • Compute: A single GPU-backed container in Sealos.
  • Storage: S3-compatible object storage in Sealos for datasets and checkpoints.
  • Image: A Docker image with PyTorch, Transformers, PEFT, BitsAndBytes, and Accelerate.
  • Orchestration: Kubernetes job or long-running pod; use Sealos UI or GitOps.
  • Serving: A separate deployment (e.g., vLLM) that loads base weights plus LoRA adapters, or merges adapters into a new checkpoint for faster inference.

Example Kubernetes Spec (Requesting a GPU)

If you’re using Sealos over Kubernetes, a simple pod spec might look like:

In Sealos, you can create similar GPU-backed workloads through the console, and provision an S3-compatible bucket using the object storage app. Store your datasets and checkpoints in S3 and access them via s3:// URIs.

High-quality, representative data lets you train fewer steps and get better results.

  • Start small: 5k–50k examples often suffice for narrow tasks.
  • Format consistently: Use a clear instruction → output schema.
  • Limit sequence length: Truncate inputs; move long references to retrieval if possible.
  • De-duplicate and sanitize: Quality beats quantity.

Example JSONL and Prompt Template

Data example (jsonl):

Prompt template applied during training:

Keeping a consistent prompt reduces prompt-shift between training and inference.

QLoRA pairs 4-bit weight quantization with LoRA adapters. The base model remains quantized; only small adapter weights are learned. This allows fitting 7B models on 16–24 GB GPUs with decent throughput.

Minimal Training Script (Transformers + PEFT)

Below is a streamlined script you can adapt. It uses:

  • BitsAndBytes 4-bit quantization.
  • LoRA adapters with PEFT.
  • Gradient accumulation and checkpointing for memory control.

File: train.py

Notes:

  • If you use s3:// paths with the datasets library, pip install s3fs and configure AWS credentials (or Sealos object storage credentials) via environment variables.
  • target_modules depend on the model architecture; Mistral/LLaMA-like models typically use q_proj, k_proj, v_proj, o_proj.
  • Set bf16=True only if GPU supports bfloat16 (Ampere+). Otherwise, switch to fp16=True.

Running the Training Job

On Sealos, you can:

  • Build your own image with the dependencies above, or
  • Use a ready-made image and pip install dependencies at runtime, then run train.py.

If you prefer a lightweight Dockerfile:

Push the image to your registry and reference it in your Sealos workload.

Evaluate before long runs to avoid wasting compute.

Smoke Test Inference with Adapters

Lightweight Perplexity Check

For generative tasks, compute perplexity on a held-out set:

Small, consistent improvements in perplexity on relevant text usually correlate with better outputs.

Two options:

  • Load base model + LoRA adapters at runtime. Pros: small artifact size, can swap adapters easily. Cons: slight overhead on first load.
  • Merge LoRA adapters into base weights offline, then serve the merged model for best inference throughput.

Merging Adapters

Upload MERGED_DIR to your model store (e.g., S3), then serve.

Serving with vLLM (Kubernetes)

vLLM is efficient and easy to deploy. Example Deployment:

With Sealos, you can expose the service via an Ingress or built-in domain routing, and test with the OpenAI-compatible endpoint:

Alternatively, you can deploy Open WebUI on Sealos to chat with your model.

  • Customer support: Teach the model to resolve common tickets, follow your policy handbook, and produce templated responses.
  • Sales assistants: Personalize outreach, qualify leads, and fill CRM fields consistently.
  • Documentation Q&A: Summarize changelogs, answer questions about your API, and draft guides with your tone.
  • Legal/finance drafting: Create structured summaries, extract clauses, or generate compliance-ready checklists.
  • Data engineering helpers: Generate SQL against your schema, annotate ETL pipeline code, and document datasets.

Tip: Keep your training data narrowly focused on the task. For retrieval-heavy tasks, combine a smaller instruction-tuned model with a vector database and RAG, rather than trying to stuff all knowledge into weights.

  • Use QLoRA with r=8–16 and lora_alpha=16–32. Often sufficient.
  • Keep sequence length tight (512–2048). Longer context is costly.
  • Start with 1–2 epochs; stop early if validation plateaus.
  • Freeze everything except attention projections (q,k,v,o) and perhaps the MLP in later experiments.
  • Enable gradient checkpointing and use paged_adamw_8bit optimizer.
  • Pre-tokenize your dataset to avoid CPU bottlenecks on the GPU clock.
  • Use spot/preemptible GPUs if your platform supports checkpoint resume.
  • Validate data quality. A single corrupted shard or mixed schema can waste runs.
  • Out-of-memory (OOM): Reduce MAX_LEN, increase GRAD_ACCUM, lower LoRA rank, or switch to smaller base model.
  • Catastrophic forgetting: Use a small proportion (5–20%) of general data mixed with your domain data, or apply regularization (e.g., KL during preference training).
  • Instruction drift: Keep a stable prompt template and mirror it in inference.
  • Overfitting: Monitor held-out loss; add early stopping; reduce epochs.
  • Tokenization mismatches: Always use the base model’s tokenizer; never mix tokenizers.

A step-by-step outline you can follow:

  1. Provision storage:

    • Create an S3-compatible bucket in Sealos to hold datasets and checkpoints.
    • Upload your JSONL dataset and optionally a held-out eval set.
  2. Prepare the image:

    • Build the Dockerfile above and push to your registry, or use a prebuilt image and provide train.py as a ConfigMap/volume in Kubernetes.
  3. Launch a GPU workload:

    • In Sealos, create a GPU-enabled job or pod using your image.
    • Set environment variables: BASE_MODEL, DATASET_PATH, OUTPUT_DIR, etc.
    • Mount credentials for S3.
  4. Monitor and iterate:

    • Stream logs to check throughput and loss.
    • If loss flattens early, cancel to save cost and adjust hyperparameters.
  5. Save and test:

    • After training, ensure adapter weights and tokenizer are saved to S3.
    • Run a quick smoke test with the inference code to validate outputs.
  6. Serve:

    • Merge adapters for faster inference (optional).
    • Deploy vLLM with the merged model or load adapters dynamically.
    • Expose an endpoint, and connect a UI (e.g., Open WebUI).
  7. Automate:

    • Use GitOps or CI/CD to trigger fine-tunes on new data drops.
    • Schedule cost-saving policies (TTL jobs, off-hours scaling) via Kubernetes.

Sealos simplifies each of these steps with an application-centric console, while still giving you full control over Kubernetes primitives when you need them.

  • Data residency: Keep datasets and checkpoints in your Sealos-hosted object storage within your region.
  • Secrets: Store access keys in Kubernetes Secrets; avoid hardcoding.
  • Licenses: Ensure your base model’s license permits your use case and redistribution if you plan to share the merged model.
  • Auditing: Log training runs and configurations; store training args alongside checkpoints for reproducibility.
  • Can I fine-tune without a GPU?

    • Technically yes for very small adapters and tiny models, but training speed will be impractical. A single mid-range GPU (e.g., L4/A10G) is a sweet spot.
  • Do I need Deepspeed or FSDP?

    • Not for 7B QLoRA on a single GPU. If you scale beyond 13B or need full fine-tuning, consider them.
  • Should I use DPO/RLHF?

    • Start with SFT. If you need preference alignment (tone, safety, style), add DPO with a small pairwise preference dataset.
  • How big should my dataset be?

    • For narrow tasks, 5k–20k high-quality examples often suffice. For broader instruction tuning, 50k–200k is common—still manageable with QLoRA.
  • Hardware: 1× L4 24GB (or A10G 24GB) on-demand for 6–12 hours.
  • Steps:
    • Data cleanup and tokenization: 1–2 hours CPU time.
    • Pilot run (1 epoch): 2–4 hours.
    • Main run (2 epochs): 4–8 hours.
    • Merge + deploy: 1 hour.
  • Estimated spend: $10–$30 depending on GPU price and total hours.

Optimize by reducing sequence length, trimming low-quality samples, and merging early if results are already good after 1 epoch.

Fine-tuning open-source LLMs doesn’t have to be expensive or complicated. With QLoRA/LoRA, careful data curation, and efficient tooling, you can ship models that feel bespoke to your domain—often in a day and well within a modest budget.

Sealos helps you operationalize this workflow: spin up GPU workloads, store datasets and artifacts, iterate in familiar developer tools, and deploy serving endpoints—without wrangling infrastructure details. Start small, measure often, and let data quality—not raw compute—drive your improvements.

When you’re ready to level up your AI stack with a practical, cloud-native approach, explore Sealos at https://sealos.io and put your fine-tuning plan into action.