惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

GemmaNotes GeekNews Frontend Weekly Deep Dive - 2026-05-25 I Built a Universal Silicon Loader That Runs on Any SOC (No Bootrom Exploit) Docker容器化部署Node.js应用最佳实践 I Put a Neural Network in a Thermometer — Then It Got Out of Hand Building MGZon: Developer Portfolio + AI Bot + Social Network (9 min demo) Bearing Life (L10): What the Catalog Number Really Tells You Stop Prompting. Start Specifying: How Spec-Driven Development Fixes AI Coding TIL a PowerPoint file is just a zip — so I converted .pptx to Word entirely in the browser 로컬 LLM 셋업 가이드 (v18) Cx Dev Log — 2026-04-24 github's agent audit api is the boring feature that matters # From Teaching Code to Building Real-World Applications Vivado 2026.1 and Linux: why this decision matters beyond the headline Vivado 2026.1 y Linux: por qué la decisión importa más allá del titular ORA-00206 오류 원인과 해결 방법 완벽 가이드 Entidades finas e composição: o design que escolhi para a nova plataforma 10 Open Source Tools Every Developer Should Know 🔥 SSH Config File Mastery: Turning `~/.ssh/config` Into a Productivity Tool I tried to create a programming language... in python I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary I Turned npm outdated into a CI Gate — Here's How Don't fall for the Claude Mythos hype Vestige: A Gemma 4 Brain Tracker That Won't Blow Smoke Up Your Ass Gemminate: Transforming Static Textbooks into Interactive Learning Journeys with Gemma 4 Where Did All the Code Playgrounds Go? I built PROOFER - Privacy first Chrome extension that proofreads your texts using Gemma 4 I Automated My Entire Digital Product Business on a $13/Month GCP VM. Here's the Architecture. Beginner's Mind in Engineering and AI How I use AI agents to turn ideas into public demos I Built a Quotation Generator for Kenyan Street Welders Using Gemma 4's Vision The Math Behind Neural Networks — Explained Like Nobody Did for Me 🧨 Understanding TPC with IEEE802.11h What I’m Starting to Look for in Engineers An npm Downloads Comparison Chart in 300 Lines of Vanilla JS — Nice-Tick Math and API-Direct Fetch Vitreus: Local-First Spreadsheet Intelligence with Gemma 4 Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions I got tired of re-explaining my codebase to ChatGPT — so I built a VS Code extension Revisiting My Phone AI After Gemma 4: The Upgrade I Didn't Know I Needed I built a privacy-first PDF merger in 7 hours — here's the stack and the lessons Google I/O 2026 made me ask an uncomfortable question: are we still coding, or are we managing builders? SSR with JavaScript: Escaping Node.js Clunkiness with AxonASP My CKA Exam-Day Experience: What Went Right, What Went Wrong, and Lessons Learned Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀 Two weeks ago, I built a private AI brain on my phone using Gemma 4. Yesterday, Google dropped a new variant that made everything I built feel like a beta test. 256M parameters. MoE architecture. Apache 2.0 license. I broke down what changed and why it mat I got tired of clicking through the Stripe dashboard, so I built a CLI Getting Data from Multiple Sources in Power BI: A Practical Guide to Modern Data Integration Google Is No Longer Just a Search Engine I built GemmaPod - A truly composable and portable AI agent solution powered by your local LLM Gemma 4 E4B caught three planted fabrications in 50 seconds — on a laptop, no cloud How to build an AI-powered content moderation pipeline for user comments Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama AI Makes Building Cheap. Our Product Architectures Still Assume It’s Expensive. I built an in-browser Roku TV remote with ~80 lines of TypeScript. Here's how Roku's ECP API actually works The Direction of Blame babbled notes: a sound-to-music agent for people who could not make music before How I Built a Live SQL Workshop Where Students Can't Break Anything Rescuing a Stranded Protocol: Re-Skinning Legacy Code for the Trestle DeFi Flywheel SOLID Heuristics Reveal Incomplete Domain Knowledge — Nothing More AllasCode Intitute / FullAgenticStack: The Intent-Based Router Introducing LogicGrid — Multi-Agent AI Orchestration for .NET AI Prompt Injection, Drupal SQLi Exploitation, and Nmap for Hardening AI Agents & Python Workflows: Anthropic Skills, Jupyter Challenges, and Edge Deployment SQLite Optimization, PostgreSQL Async Queries, & DuckLake Dataframe Spec RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix Microsoft Burned Its 2026 AI Budget on Claude Code in Six Months. That's the Real Story. Why I Started Learning FastAPI in 2026 I Abandoned Ghost for Months — Then Came Back and Finally Finished It Building an Open MIT-Licensed Ephemeris Engine in C — JPL Moshier Ephemeris 4 Smart Ways to Manage Retries in Side Projects Securing Web APIs: A Practical Guide to Authentication & Authorization Methods Google I/O 2026: AI Built an OS in 12 Hours. I Spent Mine Sorting Screenshots. 🤦 Half a Day, Not a Week: One Nix Flake for Three Machines 🌱 Keep Feeding Your CI/CD — Or Watch It Die Gemma 4 vs GPT-4o vs Llama 3: What Actually Works Locally? Vessel Ops SSH in 2026: Why Every Developer Should Know It Cold Audit AI-Generated PRs Before You Merge Them (Swarm Orchestrator 10.3.0) App Store Optimization (ASO) I built a tool to visualize Django REST Framework architecture (URLs, Serializers, Models, and more) How I made my React site agent-ready in 100 lines AI Can Generate Interfaces on the Fly. But Users Still Need Orientation. AI-Assisted Content Workflow How We Learned That Most Resume Rejections Happen Before Humans See Your CV How I Prepared for CKA: Resources, Labs, and Strategy That Worked for Me Remix Mini PC: Moving the Whole Operating System Onto the eMMC Stop Flying Blind: We Built an LLM Evaluation Framework That Works Across 17+ Agent Frameworks The Misleading "User is not authorized to access connection" Error in AWS CodeBuild — and Why Your IAM Policy Looks Fine I Resurrected a Dead F1 Project and Accidentally Built a Race Intelligence OS Remix Mini PC: After a Year of Dead Ends, the eMMC Finally Talks Not All Games Are Equal: The Real Difference Between a Trap and a Tool How to add Peppol e-invoicing to your SaaS without making it your team's problem I Built a Hermes Agent to Tell Me Which Hackathons to Enter. It Told Me to Enter This One. The Five Hooks That Change How You Ship With Claude Code Powering Your Progress: Building Robust Solutions with Laravel I built a self-hosted CI/CD platform with persistent queue, encrypted secrets, and rollback UI — here's what I learned Antigravity 2.0 and the $1,000 OS: Why "Agent-First" Feels Like the Direction I've Been Building Toward Anyway I built an AI PR-triage agent in 30 lines of Markdown Core Web Vitals from 74 to 91: A Real Tax Practitioner Site Rebuild I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened.
Longhorn Volume Health: The Gap Between 'Healthy' and Actually Working
Guatu · 2026-05-25 · via DEV Community

I once spent four hours debugging a PostgreSQL pod that was stuck in a crash loop with Input/output error across every single log line. I opened the Longhorn UI, and there it was: a bright green "Healthy" badge next to the volume. The replicas were synchronized, the nodes were up, and the dashboard insisted everything was perfect.

The reality was a stale mount on the worker node that had survived a pod migration, leaving the filesystem in a read-only state that Longhorn's control plane didn't care about.

If you're running stateful workloads on bare metal, you've probably already realized that Longhorn is great until it isn't. It simplifies distributed storage, but it introduces a layer of abstraction that can lie to you. You need to know the difference between "Control Plane Healthy" and "Data Plane Functional."

The Illusion of Health

In Longhorn, "Healthy" usually just means the replicas are in sync and the volume is attached to a node. It does not mean the application can actually write to the disk. I've hit this multiple times where the volume is technically healthy, but the pod is screaming because of permission mismatches or stale mounts.

The most common culprit is the mount layer. When a pod moves from Node A to Node B, Kubernetes expects the volume to detach and re-attach. Sometimes, the detach fails or the mount stays active on the old node. Longhorn might show the volume as attached to the new node, but the OS on the worker is still holding onto a ghost mount.

If you see I/O error in your logs but the UI is green, stop looking at the UI. You need to check the actual mount point on the worker node.

Solving the Stale Mount Trap

When a volume gets stuck, the "happy path" is to let Kubernetes handle the detachment. In reality, you often have to force the issue.

The first thing I try is scaling the deployment to zero. This forces Kubernetes to send the detach signal to the CSI driver.

# Scale to 0 to break the lock and force volume detach
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-db
spec:
  replicas: 0

Enter fullscreen mode Exit fullscreen mode

If that doesn't work, you have to go into the worker node via SSH. I've found that manually unmounting the path usually clears the deadlock. Be careful here: if you unmount a volume that is actually being written to, you're asking for filesystem corruption.

# Check for stale mounts on the worker node
mount | grep longhorn

# If you find a mount that shouldn't be there
umount -l /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-xxxx-xxxx/mounted

Enter fullscreen mode Exit fullscreen mode

The -l (lazy) unmount is the secret here. It detaches the filesystem from the hierarchy immediately, even if the resource is busy, and cleans up the references once the resource is no longer in use.

The Capacity Lie: Snapshot Bloat

Capacity management in Longhorn is where most people run into their first "production" outage. You set up a 100GB PVC, and a few months later, your node disks are at 95% capacity even though your application is only using 20GB of data.

This is snapshot bloat. Longhorn snapshots are incremental, but if you have a high-churn database (like Postgres or MariaDB) and a reckless snapshot schedule, those increments add up.

I learned this the hard way when I set up an hourly snapshot policy without a strict retention limit. The snapshots were accumulating on detached volumes that I had forgotten to delete. Longhorn doesn't automatically purge snapshots for volumes that aren't currently attached to a pod unless you explicitly tell it to.

To fix this, I adjusted my SnapshotSchedule to exclude detached volumes. This prevents the system from wasting IO and space on volumes that aren't even active.

apiVersion: longhorn.io/v1beta1
kind: SnapshotSchedule
metadata:
  name: daily-backup-critical
spec:
  schedule: "0 2 * * *" # 2 AM daily
  retention: 7           # Keep only 7 days
  excludeDetachedVolumes: true # Stop snapshotting dead volumes

Enter fullscreen mode Exit fullscreen mode

If you're already in a capacity crisis, don't just delete PVCs. Check for orphaned replicas. Sometimes a PVC is deleted from K8s, but the Longhorn volume remains in the UI as "detached." These are ghosts eating your disk space. Purge them manually from the UI or via the API.

Permissions and the SecurityContext Gap

Another "health" issue that doesn't show up in monitoring is the Permission denied error. Longhorn mounts volumes as root by default. If you're running a container as a non-root user (which you should be), the application will fail to write to the volume immediately upon startup.

I ran into this with an n8n deployment. The pod was "Running," the volume was "Healthy," but the logs were a wall of permission errors.

The fix isn't to chmod 777 the volume (don't do that). The fix is using the fsGroup in the securityContext. This tells Kubernetes to change the ownership of the volume to a specific GID when it's mounted.

spec:
  securityContext:
    fsGroup: 1000 # Matches the UID/GID of the application user
  containers:
    - name: n8n-app
      image: n8nio/n8n:latest
      # ... rest of config

Enter fullscreen mode Exit fullscreen mode

For databases, I also recommend being explicit about the data directory. Some images default to a path that might conflict with how the volume is mounted. I always override the data path to a sub-directory to avoid issues with the lost+found folder that Linux creates on the root of the volume.

env:
  - name: PGDATA
    value: "/var/lib/postgresql/data/pgdata"

Enter fullscreen mode Exit fullscreen mode

Monitoring That Actually Matters

If you want to stop guessing, you need to move beyond the Longhorn UI. I use Prometheus and Grafana to track the actual replication state.

The metric I watch most closely is longhorn_volume_replica_state. If a replica moves from healthy to degraded or faulted, I want an alert before the application notices.

One specific thing to watch for is the "Replica Count" vs "Healthy Replica Count." If you have 3 replicas but only 2 are healthy, you're one disk failure away from a total outage. This is a silent killer because the volume will still report as "Healthy" in the UI as long as one replica is available.

I've integrated these alerts into my general infrastructure monitoring. If you're managing this at scale, I highly recommend looking into predictive maintenance consulting to set up these thresholds before you hit a "disk full" panic at 3 AM.

Gotchas and Tradeoffs

I've considered using Rook-Ceph for larger workloads, and while it's more powerful, it's a nightmare to manage in a small cluster. Longhorn is the right choice for most homelabs and small production setups, but you have to accept the tradeoffs:

  1. CPU Overhead: Longhorn runs a manager pod for every volume. If you have 100 small volumes, your CPU usage will spike just from the management overhead.
  2. Disk Pressure: Longhorn doesn't have a native "thin provisioning" that's as transparent as some enterprise arrays. You need to monitor the actual node disk usage, not just the PVC usage.
  3. PDB Conflicts: If you have strict Pod Disruption Budgets (PDBs), you might find that kubectl drain hangs forever because Longhorn is struggling to move a volume. I've written about this in Pod Disruption Budgets: Why kubectl drain Gets Stuck on Longhorn.

Lessons Learned

The biggest takeaway from managing Longhorn on bare metal is that the storage layer is not a "set and forget" component.

If you're building out your storage, start with a solid foundation. I've detailed the initial setup in Kubernetes Storage on Bare Metal: Longhorn in Practice, but the operational side is where the real work is.

Always assume the UI is lying to you. When a pod fails, check the logs for I/O errors first, then check the worker node mounts, and only then trust the green checkmark in the Longhorn dashboard. Use fsGroup for every stateful app, set strict retention on your snapshots, and for the love of your sanity, exclude detached volumes from your backup schedules.