惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LangChain Blog
宝玉的分享
宝玉的分享
酷 壳 – CoolShell
酷 壳 – CoolShell
N
Netflix TechBlog - Medium
F
Fortinet All Blogs
T
Tailwind CSS Blog
Google DeepMind News
Google DeepMind News
Jina AI
Jina AI
J
Java Code Geeks
Recent Announcements
Recent Announcements
The Cloudflare Blog
D
DataBreaches.Net
Hugging Face - Blog
Hugging Face - Blog
WordPress大学
WordPress大学
Vercel News
Vercel News
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Microsoft Azure Blog
Microsoft Azure Blog
雷峰网
雷峰网
H
Help Net Security
博客园 - Franky
S
SegmentFault 最新的问题
T
The Blog of Author Tim Ferriss
博客园_首页
C
Check Point Blog
腾讯CDC
美团技术团队
Martin Fowler
Martin Fowler
The GitHub Blog
The GitHub Blog
M
MIT News - Artificial intelligence
Apple Machine Learning Research
Apple Machine Learning Research
P
Proofpoint News Feed
U
Unit 42
人人都是产品经理
人人都是产品经理
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Engineering at Meta
Engineering at Meta
M
Microsoft Research Blog - Microsoft Research
阮一峰的网络日志
阮一峰的网络日志
G
Google Developers Blog
Stack Overflow Blog
Stack Overflow Blog
B
Blog
Last Week in AI
Last Week in AI
博客园 - 三生石上(FineUI控件)
博客园 - 聂微东
云风的 BLOG
云风的 BLOG
H
Hackread – Cybersecurity News, Data Breaches, AI and More
李成银的技术随笔
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知

DEV Community

Building a 1% Fee Web3 Marketplace for Study Notes: Is a 5% Shift Sustainable? Full Agentic Stack - 5 Ideias da Arquitetura 'AI-First' que Vão Mudar a Forma Como Você Desenvolve Software Build Club Week Four: the part of Themis Lex I never explained I Tried Google Antigravity 2.0 Here's What It Actually Feels Like to Code With AI Agents By Isaac Yakubu | Google I/O 2026 Challenge Submission The growth quest picks what you avoid, not what you're already good at Firebase AI Logic's Template-Only Mode Is the Security Feature We Actually Needed Hardware Guide: What Do You Actually Need to Run Local LLMs? Constitutional Exception Committees: A Pattern for AI Agent Constraint Governance Veltrix's Treasure Hunt Engine: Optimized for Long-Term Survival, Not Just Scalability Open WebUI: Your Local ChatGPT Build a streaming UI without overcomplicating it The Cost of Kernel CVE Patching Frequency in SLA Commitments Gemma 4 Runs on a Raspberry Pi. Let That Sink In. The Git Filesystem - Recreating the Content-Addressable Database Why I Still Believe Our Event-Driven Architecture Was The Right Call For Veltrix Local RAG: Chat With Your Documents (Open Source, Private) GGUF & Modelfile: The Power User's Guide to Local LLMs What Excited Me Most at Google I/O 2026 OSS assemble! Kilo Code is launching on Product Hunt. Join the launch! https://www.producthunt.com/products/kilocode Your Organizational AI Adoption Metrics Are Lying (Plus How to Measure Real Adoption) Building a Production-Grade MLOps Home Lab on Windows — K8s, LLM, RAG & GitLab CI The Moment I Realized AI Agents are Changing Software Forever Prisma Generator NestJS DTO — pluggable DTOs with annotations and custom generators I Spent a Month Testing Decentralized Poker Sites. Here's What Actually Works. DeepSeek-R1: The $0 o1 Alternative You Can Run Right Now The PHP Stack I Built TrustGate On — And Why I'd Do It Differently Today Building High-Throughput Data Pipelines: Why Chaining Encryption and Compression is a Performance Killer Optic is dead. A 2026 migration guide for OpenAPI breaking changes Smart Blind Stick, Mini Project The NSA just published an MCP security playbook. We created Agent Trust Transport Protocol ATTP - Implement today with MCPS Symfony 8 AWS Secrets Bundle Canlı TV Platformu Geliştirirken Öğrendiğim Teknik Dersler: Streaming, Flussonic ve Performans Gemma 4 Is Powerful — But Production AI Still Needs Governance What RepoSignal Surfaced in React — and Why Review Alone Doesn't Catch Everything LeetCode Solution: 1752. Check if Array Is Sorted and Rotated Breaking the Matrix at 15: How I Built a Cyber-Aesthetic AI Assistant Core Powered by Gemma 4 Разработка Android Kiosk приложения No More Manual Test Writing: How I Used Gemma 4 to Turn a GitHub Repo Into a Full Test Suite 🎯 Trafik Cezaları Platformları Geliştirirken Öğrendiğim Teknik Dersler The Myth of Low Latency: Why Event Meshes Make Your System Slow Building EIDOLON OS — A Local-First AI Cognitive Operating System qrrot - database with AI I Built a Local Gemma 4 Reviewer for Merchant Registry Evidence Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift How to build your first MCP server in 10 minutes Expo SDK 56 Is Out, and a Few Things Finally Clicked Into Place Building a 100ms Browser-Native WebSocket Clipboard Cómo solucionar `docker run` con `Exited (1)` en Raspberry Pi Why Claude Code Sessions Diverge: A Mechanism Catalog When One AI Agent Is Not Enough: A Practical Delegation Pattern for Enterprise Systems Cómo solucionar el bucle infinito en `useEffect` con objetos y arrays 🛢️ The Dangote Chain: What a Blockchain-Native Refinery IPO Would Look Like Build a "Where to Watch" feature in 50 lines with the StreamWatchHub API Gemma 4 on Android: Tricks for Faster On-Device Inference Your AI agent has amnesia. You've just normalized it. 🚀 Reviving My Women Safety System – From Idea to Real-Time Smart Safety Solution I built an AI that reviews every PR automatically (because nobody was reviewing mine) 🌿 Git Mastery: The Complete Developer Guide Bringing Gemma 4 E2B to the Edge: Building a Privacy-First Dream Analyzer with Flutter & LiteRT Google I/O 2026 Wasn’t About Features — It Was About AI Becoming the Developer Environment Building an AI Vedic Astrology App in 25 Days — What Actually Worked (and What Didn't) Hermes Agent Has Four Memories — And That's Why It Doesn't Forget You Pressure Isn't Killing You -Your Relationship With It Is 🐳 How to Run Any Project in Docker: A Complete Guide AccessLens — a blind person's lanyard, powered by Gemma 4 on-device Glyph v0.2: the release is the joinery How I Built a Blazingly Fast, Privacy-First Batch Image Converter in the Browser Using OPFS and Web Workers Cómo solucionar \"Text content does not match server-rendered HTML\" en Next.js App Router FCoP 3.0: Why AI Agents Need a Track, Not a Brake Fibonacci: Quiz app which anyone can make revenue by viewing ads to the quiz contestants. The Subconscious Powered by Edge AI GPU Utilization Is Becoming the New Cloud Waste Crisis Cómo solucionar `docker run` con exit code 1 en Raspberry Pi JWT is a scam and your app doesn't need it 7 Agent Skill Packs That Actually Make AI Coders Better More Control, More Cost: Why Commanding AI Isn't Delegation SecureScan Synthadoc: We Built an AI Judge for Our AI Wiki Compiler - Here's What We Learned Cómo solucionar el error de permiso al ejecutar `pip.exe` en entorno virtual (Python 3.10 en Windows) Postgres-grade Serializable at 20k+ ops/s — on a laptop. Don’t try this at home. Pure Core, Imperative Shell in Rust with Stillwater Lean 4 for Programmers: Building a Todo List with Proof Trustless Bug Bounty Releases with a PoW-Gated DLC Oracle Building Autonomous DevOps Agents with MCP and LangChain Multimodal Gemma 4 Visual Regression & Patch Agent Git Time Machine — How Version Control Can Save Your Project My Dad Got an Electricity Bill He Couldn't Understand. Google I/O 2026 Just Made That Problem Solvable. My Dad Got an Electricity Bill He Couldn't Understand. Google I/O 2026 Just Made That Problem Solvable. Read Replicas Lie About Consistency. 4 Sync Modes Behind the Lie. Reviving My Coding Project with GitHub Copilot I Tried Gemini 3.5 Flash After Google I/O 2026 - Here is What I Found :)) Zero-Cost AI in VS Code Blueprints Might Be More Important Than Frameworks AI CareCompanion - Offline Health Assistant Long-Context Models Killed RAG. Except for the 6 Cases Where They Made It Worse. I Built a Neural Network Engine in C# That Runs in Your Browser - No ONNX Runtime, No JavaScript Bridge, No Native Binaries An In-Depth Overview of the Apache Iceberg 1.11.0 Release Your Agent Just Called the Same Tool 47 Times. Here's the 20-Line Detector. How I Built a Multi-System Astrology Bot in Python (And What Meta Banned Me For) Gemma 4 Has Four Variants. Here's How to Pick the Right One Before You Write a Single Line of Code.
Getting Started: Run Your First Local LLM in 5 Minutes
Lingdas1 · 2026-05-24 · via DEV Community

Lingdas1

01 — Getting Started: Run Your First Local LLM (5 Minutes)

🟢 Beginner — No experience needed. Just a computer and 5 minutes.


What Is a Local LLM? (Plain English)

An LLM (Large Language Model) is the brain behind ChatGPT, Claude, and Gemini.

A local LLM runs that brain on your own computer — not on someone else's server.

Why does that matter?

Cloud AI (ChatGPT, Claude) Local AI (Ollama + models)
$20–$200/month subscription $0 — completely free
Your data is sent to their servers Private — everything stays on your machine
Requires internet Works offline
Censored, filtered, rate-limited No limits — you control everything
One-size-fits-all model Choose any model for any task

💡 Think of it this way: Cloud AI is like renting a car. Local AI is like owning a bicycle. The bicycle is slower, but it's yours, it's free, and nobody can take it away from you.


What You Need

Minimum requirements:

  • A computer (Windows, macOS, or Linux)
  • At least 8 GB of RAM (16 GB recommended)
  • A few GB of free disk space

Nice to have (but not required):

  • A GPU with 4+ GB VRAM (models run faster, but CPU is fine to start)

My setup: I'm running this on a [your hardware] with [your specs]. If it works for me, it'll work for you.


Step 1: Install Ollama

Ollama is the easiest way to run local LLMs. Think of it as the "App Store for AI models."

macOS

curl -fsSL https://ollama.com/install.sh | sh

Enter fullscreen mode Exit fullscreen mode

Linux

curl -fsSL https://ollama.com/install.sh | sh

Enter fullscreen mode Exit fullscreen mode

Windows

Download the installer from ollama.com/download and run it.

Verify Installation

Open a new terminal and type:

ollama --version

Enter fullscreen mode Exit fullscreen mode

You should see something like:

ollama version 0.6.0

Enter fullscreen mode Exit fullscreen mode

🔥 Pro tip: If you get "command not found" on Linux/macOS, restart your terminal or run: export PATH=$PATH:/usr/local/bin


Step 2: Pull Your First Model

Now for the fun part — downloading an actual AI brain to run on your computer.

ollama pull qwen2.5:7b

Enter fullscreen mode Exit fullscreen mode

This downloads a 4.7 GB model. On a typical internet connection, it takes 2–5 minutes.

While it downloads, here's what's happening:

  • Ollama is downloading a GGUF file (the compressed model format)
  • It's auto-detecting your GPU
  • It's setting up the inference engine

What if the download is too big? Try a smaller model:

# For 8 GB RAM laptops — works on almost anything
ollama pull qwen2.5:1.5b

# For 4 GB RAM or very old computers
ollama pull qwen2.5:0.5b

Enter fullscreen mode Exit fullscreen mode


Step 3: Chat With Your Model

ollama run qwen2.5:7b

Enter fullscreen mode Exit fullscreen mode

You'll see a prompt like >>>. Type something:

>>> Write a haiku about a cat sitting on a computer

Enter fullscreen mode Exit fullscreen mode

The model will think for a moment and then respond. Congratulations — you just ran an AI on your own hardware! 🎉

Try These First Commands

>>> Write a Python function to calculate fibonacci

>>> Explain quantum computing like I'm 10

>>> What's the meaning of life?

>>> /? -- show all available commands

>>> /exit -- quit the chat

Enter fullscreen mode Exit fullscreen mode

⚠️ Expect it to be slower than ChatGPT. That's normal! Local models run at 15–40 tokens per second on a GPU, or 2–6 tok/s on CPU. It's still faster than most people read.


Step 4: Choose the Right Model for Your Hardware

Not sure which model to pick? Use this decision tree:

Your GPU VRAM?
├── No GPU (CPU only)
│   ├── 32 GB RAM → qwen2.5:7b (slow but works)
│   ├── 16 GB RAM → qwen2.5:1.5b
│   └── 8 GB RAM  → qwen2.5:0.5b
├── 4–6 GB VRAM   → qwen2.5:7b
├── 8–12 GB VRAM  → deepseek-r1:14b (🟢 BEST for most people)
├── 12–16 GB VRAM → deepseek-r1:32b
├── 24 GB VRAM    → qwen3.6:27b or deepseek-r1:32b (Q4)
└── 36+ GB VRAM   → deepseek-r1:70b or qwen2.5:72b

Enter fullscreen mode Exit fullscreen mode

Model Comparison Table

Model Ollama Command Size (Disk) Min RAM Min VRAM Quality
Qwen 2.5:0.5B ollama pull qwen2.5:0.5b 0.5 GB 4 GB None Basic text
Qwen 2.5:1.5B ollama pull qwen2.5:1.5b 1.1 GB 8 GB None Simple tasks
Qwen 2.5:7B ollama pull qwen2.5:7b 4.7 GB 8 GB 4 GB 🟢 Good start
Qwen 2.5:14B ollama pull qwen2.5:14b 9.0 GB 16 GB 8 GB Excellent
DeepSeek-R1:14B ollama pull deepseek-r1:14b 8.2 GB 16 GB 8 GB 🏆 Best value
DeepSeek-R1:32B ollama pull deepseek-r1:32b 18.7 GB 32 GB 16 GB Near o1 level
Qwen 3.6:27B ollama pull qwen3.6:27b 15 GB 32 GB 16 GB Cutting-edge
Llama 4:8B ollama pull llama4 4.9 GB 8 GB 4 GB Good general

My recommendation for first-timers: Start with qwen2.5:7b. It runs on almost anything, and it's good enough to be genuinely useful.


What to Do After Your First Chat

You've run your first local LLM. Now what?

Next steps in order:

# Task Why Guide
1 Customize your model with a Modelfile Control temperature, context length, and behavior GGUF & Modelfile Guide
2 Install Open WebUI Get a ChatGPT-like web interface instead of the terminal Open WebUI Setup
3 Benchmark your hardware See what speeds your setup can achieve Script: ./scripts/ollama-benchmark.sh
4 Add document search (RAG) Let your LLM answer questions about your own files RAG Guide
5 Try a reasoning model Switch to DeepSeek-R1 for harder problems DeepSeek-R1 Guide

Common First-Timer Problems (And Fixes)

Problem Why Fix
"ollama: command not found" Ollama not in PATH Restart terminal, or run: export PATH=$PATH:/usr/local/bin
Download is very slow Big file on slow internet Try ollama pull qwen2.5:1.5b instead (much smaller)
Model responds very slowly Running on CPU This is normal! See speed expectations in the table above
Model responds in Chinese Default template includes Chinese Add SYSTEM "Always respond in English." to a Modelfile
"CUDA out of memory" Model too big for your GPU Use a smaller model or lower quantization
"Connection refused" Ollama server not running Run ollama serve in a separate terminal first

Quick Reference: Common Ollama Commands

# List all downloaded models
ollama list

# Show currently running models
ollama ps

# Delete a model to free space
ollama rm qwen2.5:7b

# Update a model to the latest version
ollama pull qwen2.5:7b

# Run a model with a one-shot prompt (non-interactive)
ollama run qwen2.5:7b "Write a Python script to download images from a URL"

# Use the API (OpenAI compatible)
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Hello!"}]}'

Enter fullscreen mode Exit fullscreen mode


Your First Week Plan

Day Task Time
Day 1 Install Ollama + pull a model + chat with it 5 minutes ✅
Day 2 Try different models (small vs large) 15 minutes
Day 3 Customize with a Modelfile 30 minutes
Day 4 Install Open WebUI 30 minutes
Day 5 Ask your LLM to write code or help with real work 1 hour
Weekend Try RAG — let your LLM read your documents 1 hour

🎯 You've taken the first step. Running a local LLM is like learning to ride a bike — wobbly at first, but once you get it, you'll wonder why you didn't start sooner.

Found this helpful? ⭐ Star the repo — it helps others find it too.

— Ling, a medical student who accidentally fell into AI and wants to help you do the same.