惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Azure Blog
Microsoft Azure Blog
有赞技术团队
有赞技术团队
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
F
Fox-IT International blog
Recorded Future
Recorded Future
T
ThreatConnect
T
The Exploit Database - CXSecurity.com
SecWiki News
SecWiki News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
人人都是产品经理
人人都是产品经理
T
Tenable Blog
L
LINUX DO - 最新话题
博客园_首页
Hugging Face - Blog
Hugging Face - Blog
罗磊的独立博客
博客园 - 司徒正美
The Hacker News
The Hacker News
博客园 - 聂微东
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Scott Helme
Scott Helme
博客园 - 【当耐特】
O
OpenAI News
Schneier on Security
Schneier on Security
Latest news
Latest news
S
Security @ Cisco Blogs
S
Secure Thoughts
F
Full Disclosure
L
Lohrmann on Cybersecurity
S
SegmentFault 最新的问题
T
Tor Project blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
量子位
小众软件
小众软件
T
Threat Research - Cisco Blogs
Simon Willison's Weblog
Simon Willison's Weblog
IT之家
IT之家
大猫的无限游戏
大猫的无限游戏
N
News and Events Feed by Topic
E
Exploit-DB.com RSS Feed
J
Java Code Geeks
Last Week in AI
Last Week in AI
酷 壳 – CoolShell
酷 壳 – CoolShell
Application and Cybersecurity Blog
Application and Cybersecurity Blog
S
Schneier on Security
Cisco Talos Blog
Cisco Talos Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Proofpoint News Feed
Recent Commits to openclaw:main
Recent Commits to openclaw:main
雷峰网
雷峰网

DEV Community

Mathematical Functions in CSS: clamp, min, max and How They Simplify Responsiveness Polyglot Persistence in Microservices: Let the Domain Choose the Database 190 Countries, Zero API Calls: Shipping Static Data in a Chrome Extension Your AI Writes Code Fast. Here’s How to Check It Before Shipping Building Automated Text-to-Video Pipelines with AI Can Gemini Become an Offline AI Tutor? Lessons from Building Educational AI OPRIX : From a simple messaging web app to a well structured and enhanced UI messaging web app Why React + TypeScript Nullability Slowly Becomes Exhausting Why AI Agents Need a Project Layer - Part 1 Stop Hand-Editing MCP Configs: A Zero-Dependency Go CLI What I Learned Working With Microsoft, SQUAD(GTCO), and Different Tech Communities 🧠 Hermes Agent Assistant — A Modular AI Agent System with Planner, Executor & Memory Spring Boot Auto-Configuration Source Code: Nail This Interview Question The Ultimate Guide to Free AI API Keys: 6 Platforms You Need to Know Why 91% of AI Agents Fail in Production (And What the 9% Do Differently) TryHackMe | Battery | WALKTHROUGH Stop Guessing Your Regex — Test It Live in the Browser I Built FreelancEye, an Open-Source Mobile PWA for Finding Clients Beyond the Hype: My Production Playbook for Docker Swarm Top AI App Builder Platforms with Integrated Backend, Hosting & Database ECS vs EKS in 2026: An Honest Comparison from Someone Who Has Run Both in Production Hardening Your Node.js App Against Supply Chain & Remote Code Execution Attacks linux commands A Practical GEO Case: How an AI System Started Recommending Our Blog Your AI Agent Works 24/7 and Earns $0. I Built the Fix. Your AI Trading Agent Will Lose All Your Money — Here's How To Stop It Google I/O 2026: What Happens When Everything Connects? Why AI writes software but doesn’t build a good product Beyond the Hype: How Google I/O 2026 Secretly Democratized Production-Ready AI Agents with Managed Sandboxes. The Killer Assumption Test: How to Spot Doomed Product Decisions Before You Ship Stop Describing Your Bugs — Just Screenshot Them # I Built an AI Website Builder and Here's What Actually Happened Cooking an AI Campaign in 5 Minutes with Google Cloud AI APIs Your PM Retrospectives Are Lying to You How I Built a Free, Self-Hosted Pipeline That Auto-Generates Faceless YouTube Shorts TypeScript 54 to 58: The Features That Actually Matter in 2026 How to Tailor Your CV to Any Job Posting in 2026 The 7-day SaaS MVP loop: ship fast, then validate with people who actually show up 95. Fine-Tuning LLMs: Make a General Model Do Your Specific Job What Is a Frontend Developer Roadmap and Why You Need One Google shipped three Gemini "Flash" models. Picking the wrong one could 6 your AI bill Building an MCP server so Claude can query my SaaS analytics directly Google I/O 2026 and the Rise of the AI Ecosystem Your Docker Builds Are Slow Because You're Doing It Wrong (And I Built a Tool to Prove It) How do you verify GitHub contributions without trusting self-reported skills? CV vs Resume: What's the Difference and Which Do You Need? student Devs: Build AI Agents & Compete for $55K in Prizes 🚀 How to Write a Cover Letter That Actually Gets You Interviews Battle-Tested: What Getting Hacked Taught Me About Web & Cyber Security Unda folders za kuandika code >> mkdir src >> cd src >> mkdir controllers database routes services utils >> cd .. Directory: C:\Users\mwaki\microfinance-system Mode LastWriteTime Length Name Code Coverage .NET AI slop debt" is technical debt on fast forward. Nobody's ready. Multi-Head Latent Attention (MLA) Memoria - A Local AI Reading Companion Powered by Gemma 4 Stop Trusting Your Accuracy Score: A Practical Guide to Evaluating Logistic Regression Models Serious Question: Is the Developer Job Actually in Risk Due to AI? published: true tags: #discuss #career #ai #help rav2d: We ported an AV2 video decoder from C to Rust — here's why Your New Domain's First Week of GA4 Is a Lie: 4 Days of Raw Data from a Launch Gemma Guide - Real-Time Spatial Awareness for Blind Users From YAML to AI Agents: Building Smarter DevOps Pipelines with MCP A Field Guide to Human–AI Relations (For the Newly Bewildered Mortal) The AI Agent That Learns While It Works — A Complete Guide to Hermes Agent Inviting collaborators to work on ArchScope ArchScope is an interactive web-based tool that lets you design, visualize, and test system architectures with real-time performance simulations. Github - ArchScope is an interactive web-based tool that lets you Gemma 4: Google's Open-Weight AI Is a Game Changer for Developers Confessions of a Git Beginner: Why the Terminal Stopped Scaring Me Docker 容器化实战:从零到生产部署 🚀 I Built a Full Stack Miro Clone with Real-Time Collaboration using Next.js Building an African Economic Data Pipeline with Python, DuckDB & World Bank API llms.txt vs robots.txt vs ai.txt: The Developer's Cheat Sheet Intigriti Challenge 0526 Writeup Business Logic Flaws: How Attackers Skip Steps in Your App to Get What They Should Never Have Why Vibe Coders Need Boilerplates to Save Time, Tokens, and Build More Secure SaaS Projects Idle Cloud Cost Is the New Egress Cost Quark's Outlines: Python Traceback Objects Ghost in the Stack (Part 1): Why uninitialized variables remember old data Building a High-Performance Local Chess Assistant Extension with WebAssembly Stockfish and Manifest V3 Breaking the Trade-off Between Self-Custody and Intelligent Automation on the Stellar Network I Open-Sourced a Practical Fullstack Interview Preparation Repository (React + Node + System Design) 🚀 How I Started Coding as a Student (Beginner-Friendly Guide) WordPress vs. Ghost: Why Automated Bot Attacks Are Making us think much I tested 4 AI agent-governance tools against an open spec - here's the matrix zkML Inference Proof: What the Receipt Proves, and What the Model Still Does Not I Scored 1000/1000 on AWS Certified AI Practitioner (AIF-C01) Here's Every Resource I Used Go - Struct and Interface Handling JSON Requests in Go Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS How I Caught and Fixed an N+1 Query in My Django REST API I got tired of paying $10/month to remove image backgrounds – so I built it for free How to Start Coding as a Student: A Complete Beginner’s Guide 🚀 Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS What Are Buffers? Build AI Agents with Hot Dev The Client Onboarding Checklist That Prevents 90% of Project Problems Scalable Treasure Hunts Are a Myth, But We Almost Made One Gemini 3.5 Flash Has a 1M Token Context Window. Here's What You Can Actually Build With It. I built a ultra-polished developer portfolio template using React & Tailwind v4 (with zero-JSX configuration) Gemini CLI Is Dead. Here's the Better Thing That Replaced It Post-quantum cryptography for embedded and IoT: secure boot, TLS and OTA Understanding Optimistic Preloading in Modern Applications Nobody Wants to Read Your Code (And You Don't Want to Read Theirs)
qwen2.5-coder is too slow for Claude Code on a Mac. Here's the fix.
Malik Chohra · 2026-05-23 · via DEV Community

Claude Code does not care where the model lives. Point it at a local model and it works with no network. I tested that at 35,000 feet, picked the wrong model first, and swapped mid-flight.

TL;DR

  • Claude Code reads two environment variables to decide where its model lives. Point them at Ollama and it runs fully offline.
  • I tested this on a real flight. Berlin, May 13, wifi off, cabin door closed.
  • I started on qwen2.5-coder:14b. It was too slow for anything agentic. One tool call sat for 25 seconds, the next for 52.
  • I switched to gemma4:26b. That one carried the session.
  • Local is for offline work, privacy-sensitive code, and cheap drafting. Cloud is still better for heavy reasoning and large-context tasks.
  • The install takes 20 minutes once. After that, switching models is one command.

The setup, in one paragraph

Ollama runs an open-weights model on your laptop. Claude Code points at Ollama instead of Anthropic's servers. No network call leaves the machine. The cloud account is irrelevant for that session. The only real decision is which local model you run, and that decision is where I got it wrong the first time.

Why offline beats "just use a smaller cloud model"

Before the setup, the three objections I get every time:

  • "Just don't code on a plane." A flight is six uninterrupted hours. No social media, no notifications, nothing that pulls focus. That is rare now. Throwing it away because your LLM needs wifi, when the wifi problem is fixable, is a planning failure.
  • "Just use Copilot offline." Copilot's local mode does completions. Anything context-heavy still hits the network. The moment you ask for the work that justifies an AI assistant, you are back online.
  • "Just use a smaller cloud model." Haiku and GPT-4o-mini still live in the cloud. Smaller is not local. No network, no inference. Same failure, smaller bill.

Local is the only setup that runs at 35,000 feet. It also runs on a train through a tunnel, in a cafe with broken wifi, and on the morning the OpenAI status page goes red. The flight is just the stress test.

What you need

  • A Mac on Apple Silicon (M1 or newer). Linux and Windows via WSL2 work with minor changes.
  • Claude Code installed and already authenticated against your cloud account.
  • About 16 GB of unified memory. 32 GB if you want the larger models comfortable.
  • Homebrew, for the Ollama install.
  • 20 minutes the first time. Roughly 90 seconds every time after.

Step 1 — Pull the model before you fly

Install Ollama and pull a model:

brew install ollama
ollama pull qwen2.5-coder:14b

Enter fullscreen mode Exit fullscreen mode

Do this on home wifi the night before. The pull is around 9 GB. Airport wifi and hotspots will not cooperate, and finding that out at the gate is its own small tragedy.

Confirm it landed:

ollama list

Enter fullscreen mode Exit fullscreen mode

This was my mistake, so I will be blunt about it: I prepped qwen2.5-coder:14b because it is the model every "local LLM for coding" post recommends. Pull more than one. You will see why in Step 4.

Step 2 — Point Claude Code at Ollama

Start the Ollama server in one terminal:

ollama serve

Enter fullscreen mode Exit fullscreen mode

Then in a new terminal, launch Claude Code against your local model:

ollama launch claude --model qwen2.5-coder:14b

Enter fullscreen mode Exit fullscreen mode

Wrap that in two shell aliases so the rest of your workflow has named modes. Add these to ~/.zshrc:

alias claude-local='ollama launch claude --model gemma4:26b'
alias claude-cloud='claude'

Enter fullscreen mode Exit fullscreen mode

Then source ~/.zshrc. That is the entire switching layer.

claude-local runs offline against Ollama. claude-cloud runs against the real Anthropic API. Two commands, one decision per session.

Step 3 — Verify on the ground

Prove the setup works in airplane mode before you board anything. This is non-negotiable. Discovering a missing step at altitude is bad theater with no exits.

  1. Make sure ollama serve is running.
  2. Turn wifi off. Actually off, not "disconnected from this network."
  3. Run claude-local and point it at a real file.
  4. Confirm a real answer comes back.

Terminal screenshot of  raw `claude-local` endraw  successfully answering with wifi off

If it loads your project and answers with wifi off, it will work on the plane.

Step 4 — The flight: qwen2.5-coder was too slow

The best move I made was running the model without wifi on the ground first and measuring real performance. Every forum I read pointed at qwen2.5-coder. I trusted them. They were wrong for this job.

File reads were fine. Short explanations were fine. Then the model tried anything agentic, and the wait times stopped being a rounding error.

Terminal screenshot showing slow tool-call wait times on qwen2.5-coder during the flight

One tool call crunched for 25 seconds. An earlier step had sat at 52. For a single step in a loop that needs five or six of them, that is not a workflow. That is staring at a terminal while the person next to you finishes a movie.

qwen2.5-coder:14b is a fine model for single-shot edits. For the multi-step tool loop that Claude Code actually runs, on this hardware, it could not keep up. The model every post recommends was the wrong call for the job I had.

Step 5 — The swap: gemma4:26b carried the session

I had pulled a second model before the flight, exactly because I did not fully trust the first one. So I switched to gemma4:26b.

Terminal screenshot of  raw `ollama pull gemma4:26b` endraw

Bigger model, 17 GB on disk, and on this MacBook it was the difference between a demo and a tool. The tool loop ran at a speed I would actually choose. The gap analysis completed. Multi-step reasoning held together instead of stalling halfway.

Terminal screenshot of gemma4:26b running a real Claude Code gap-analysis task

Honest scorecard for the flight: roughly 70 percent of my normal Claude Code workflow worked on gemma4:26b. The 30 percent that did not was the heavy "go reason across the whole repo" pattern, which is cloud territory anyway. For six hours of focus on a known task, it was a real working setup, not a downgrade.

Because I already had a tight context-engineering setup with optimised token consumption, it ran smoothly. The Mac started lagging briefly when I had Xcode and Antigravity open alongside, but closing those and cleaning up Chrome tabs sorted it. If you want the context-engineering side, the U-AMOS write-up is here: I spent 6 months losing fights with AI in React Native. Then I built U-AMOS.

Practical tip: install the OneTab Chrome extension. Collapse open tabs into a list when you start a focus session. RAM frees up immediately and so does your attention. OneTab on the Chrome Web Store.

Which local model should you actually run?

The lesson from the flight changed my default. Here is the short list I keep now:

Model picker comparison card — Devstral / Qwen3-Coder / Gemma 4 / Llama 3.3

  • Devstral Small (24B) — built for agentic coding, multi-file edits, tool use. Currently the strongest open-source option on SWE-bench.
  • Qwen3-Coder (30B) — RL-trained on SWE-bench, native tool calling, large context. The successor to the model that failed me, and it is a real upgrade.
  • Gemma 4 (4B to 31B) — the best size-to-capability ratio. The 26b variant is what saved my flight.
  • Llama 3.3 (70B) — solid general coding and stable tool calling if your machine can carry it.

Notice what is not on that list: qwen2.5-coder. That is not an accident. Pick a model that is RL-trained for tool use, not just code completion. Claude Code lives or dies on the tool loop.

When to use local vs cloud

After running both for weeks, the rule is simple.

Reach for claude-local when:

  • There is no network. Planes, trains, dead cafes, conference wifi.
  • The code is privacy-sensitive. Client work under NDA, anything you do not want crossing a vendor boundary.
  • You are drafting and iterating prompts before spending cloud tokens on the real run. Local cost stays at zero.

Reach for claude-cloud when:

  • The work is multi-tool and agentic. Subagents, MCP calls, parallel reads.
  • The task needs large context. Whole-repo refactors, "explain this project."
  • The output ships to production. The polish gap between a local model and cloud Claude is real.

You do not pick once and live there. The aliases exist so you can switch inside a single session. Draft offline, land, run claude-cloud for the high-stakes execution.

Where this breaks

The honest section, because AI-generated tutorials never have one.

  • Tool use is the weak point. Even good local models are less reliable than cloud Claude at chaining many tool calls. Expect rough edges if your workflow leans hard on subagents and MCP servers.
  • Context windows are smaller. Sessions that try to load the entire repo will choke. Scope to the files in play, not the whole tree.
  • Battery drains faster. Running a 26B model while your editor and browser are open will eat the battery noticeably quicker than cloud Claude. Plan for it on a long offline session.
  • The endpoint shape is a soft contract. Ollama's responses are close to Anthropic's, not identical. Most coding requests work. If you hit a strange parsing error mid-stream, that mismatch is usually why, and claude-cloud is the fix in the moment.
  • Model versioning is your job now. Ollama makes pulling easy, but you decide when to upgrade and which variant. Keep a note of what you run and why.

Where to go next

This offline setup is one of three layers in a full AI-coding stack: cloud LLMs for heavy reasoning, local LLMs for offline and private work, and on-device LLMs for the mobile apps you ship to users. The on-device side for React Native is its own problem, covered in the Phi-3 Mini integration walkthrough. All three ship pre-wired in the AI Mobile Launcher AI Pro tier, so you are not assembling this from scratch.

I packaged the rest of this into the Local LLM with Claude Code bundle: the paste-ready zshrc aliases plus a claude-status helper, the Ollama config tuned for Apple Silicon, the model-picker matrix, and a pre-flight checklist so the setup is never a surprise at altitude. Reply to the Code Meet AI newsletter and I will send it.

FAQ

Can I run Claude itself locally?

No. Claude is closed-weight, so there is no local-runnable Claude. This setup uses Claude Code, the CLI, with an open-weights model like Gemma 4 or Devstral serving the inference. The CLI is the interface, the model is whatever endpoint you point it at.

What is the best local LLM for coding with Claude Code?

For the agentic tool loop Claude Code runs, pick a model RL-trained for tool use: Devstral Small, Qwen3-Coder, or Gemma 4. Avoid older completion-tuned models like qwen2.5-coder. They handle single edits fine but stall on multi-step work.

Does Claude Code airplane mode actually work with no signal?

Yes. With Claude Code pointed at local Ollama, no request leaves your laptop. I ran a full session at 35,000 feet with wifi off. The only requirement is pulling the model in advance.

Why Ollama and not LM Studio or llama.cpp?

Ollama wraps llama.cpp with a clean HTTP API on a known port. LM Studio works too but is GUI-first. Direct llama.cpp gives more control and more setup pain. Ollama is the path of least resistance for getting this running in under 30 minutes.

Will I get the same code quality as cloud Claude?

No. A good local model is excellent for syntax-level work: refactors, cleanup, rewriting a hook. For plan-heavy or reasoning-heavy tasks the gap is large. Use cloud for design, local for execution, or use local to draft and cloud to polish.


Malik Chohra — 9 yrs software, 7 in React Native. Building Wire RN, AI Mobile Launcher, and Code Meet AI.