惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LINUX DO - 热门话题
T
The Blog of Author Tim Ferriss
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
美团技术团队
博客园 - 叶小钗
李成银的技术随笔
V
Visual Studio Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Apple Machine Learning Research
Apple Machine Learning Research
Hugging Face - Blog
Hugging Face - Blog
V
V2EX
博客园 - 司徒正美
Blog — PlanetScale
Blog — PlanetScale
大猫的无限游戏
大猫的无限游戏
T
Tailwind CSS Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
aimingoo的专栏
aimingoo的专栏
人人都是产品经理
人人都是产品经理
GbyAI
GbyAI
A
About on SuperTechFans
罗磊的独立博客
W
WeLiveSecurity
L
LINUX DO - 最新话题
M
MIT News - Artificial intelligence
Hacker News: Ask HN
Hacker News: Ask HN
Application and Cybersecurity Blog
Application and Cybersecurity Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
Microsoft Security Blog
Microsoft Security Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
H
Help Net Security
Martin Fowler
Martin Fowler
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
The Register - Security
The Register - Security
M
Microsoft Research Blog - Microsoft Research
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园 - Franky
The Cloudflare Blog
C
Cisco Blogs
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Google Online Security Blog
Google Online Security Blog
有赞技术团队
有赞技术团队
AWS News Blog
AWS News Blog
C
Cybersecurity and Infrastructure Security Agency CISA
小众软件
小众软件
I
Intezer
N
Netflix TechBlog - Medium
N
News and Events Feed by Topic

DEV Community

Track YC Demo Day Companies in Real Time (with code) Investigation Reports: When Monitors Get Smarter Semantic Layer Best Practices: 7 Mistakes to Avoid I Run MCP Servers. Here's What the Recent Vulnerabilities Actually Mean for Me Phive v1.1.1 — automatic port conflict handling for local VS Code environments Building a SQL-like Relational Database Engine in C++ From Scratch How a Self-Documenting Semantic Layer Reduces Data Team Toil The Adopter: Advocating for OSS You Use (But Don't Own) Optimizing Vite Build Output: A Practical Guide to Tree-Shaking I built a free audit tool that runs 12 checks in parallel against any domain. Here is the architecture. I made a free 7-video series to prep for the new GH-600 (GitHub Agentic AI Developer) cert Why One Model Is Never Enough: Routing Incident Analysis With cascadeflow Forecast Cone: A Grand Theorem for Computable Software Evolution Choosing the Right Treasure Map to Avoid Data Decay in Veltrix Migrating to Apache Iceberg: Strategies for Every Source System Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead Implementation of AI in mobile applications: Comparative analysis of On-Device and On-Server approaches on Native Android and Flutter Should you use Gemma 4 for your Development? A Multiversal Analysis to Determine if Gemma 4 is Right for You! The Rising Trend of Creative Interview Questions in Tech I Spent Hours Fighting a Silent Subnet Conflict to Build an Isolated ICS Security Lab (And What It Taught Me About the Linux Kernel) It Worked When I Closed the Laptop. I Swear. We Built an Agent That Flags Fake Internships #kryx Your Personal AI Stack Is the New Dotfiles Your LLM Bill Is Exploding Because of Architecture, Not Pricing -- Here's the Fix How We Prevent Attendance Fraud Using GPS Verification AI Code Review in 2026: How the Tools Actually Differ (A Builder's Field Guide) From Problems to Patterns: Generative AI in .Net (C#) GemmaOps Edge: From 373 Alarms to 1 Root Cause Using Local AI (Gemma 4) Building an Amazon EKS Security Baseline Hands-On with Apache Iceberg Using Dremio Cloud 🤫 Firebase Is Quietly Preparing for an Offline-First AI Future Should Angular Apps Still Rely on RxJS in 2025? Gaslighting Gemma 4: Can Open-Weight Reasoning Models Withstand a Confident Liar? AI Workflow Automation Needs More Than Another Script Reviving Cineverse: From Local Storage to Firebase 🚀 Approaches to Streaming Data into Apache Iceberg Tables How to Add Rounded Corners to an Image Online The subtle impact of AI (&amp; IT) on jobs Made a Rust based AI agent Your AI is not bad, your instructions are What Clicked for Me After Building on Solana for a Few Days WhatsApp's Encryption Stack: What It Covers, What It Doesn't, and What a Federal Agent Spent 10 Months Investigating Building CogniPlan: A Local-First Task Planning System Using Apache Iceberg with Python and MPP Query Engines How I Built AegisDesk: A Zero-Token Semantic IT Agent with <5ms Latency I built CodeArchy: an open-source that turns any codebase into a visual, explainable architectural experience, powered by Gemma 4. The Day Our Bot Ran Out of Money How we're using Gemini Embeddings to build a smarter, community-driven feed on DEV The Speculative Decoding Pattern The PKCE "Gotcha" in Expo’s exchangeCodeAsync TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4) n8n for Healthcare: 5 Automations for Clinics, Practices, and Health Tech Teams (Free Workflow JSON) How I Built an OWASP Memory Guard for AI Agents (ASI06) Condition-Based vs Time-Based Maintenance: Making the Switch I Tested Spam Protection on Formspree vs Formgrid. The Results Were Surprising. May 27 - Video Understanding Workshop Beyond Keywords: How Google's 2026 Algorithms are Redefining SEO From Click to Cart: Ensuring an Accessible Customer Journey in WooCommerce Your company won't replace you with good AI. They'll replace you with bad AI. How to Use an SVG Icon Search Engine as a Claude Custom Connector O fim do “modelo que faz tudo”? Conheça o Conductor, a IA que orquestra outras IAs 10 First-Principles Strategies to Learn Any Programming Language Deeply 10 First-Principles Strategies to Learn Any Programming Language Deeply Understanding Embeddings easily. The Hidden Cost of “Move Fast and Break Things” Why Your Logs Are Useless Without Traces DressCode: Your AI Stylist for Tomorrow The Documented Shortcoming of Our Production Treasure Hunt Engine I'm 16, and I Built an AI Tool That Audits Your Technical Debt Without Ever Touching code Building Your Own Crypto Poker Bot: A Developer's Guide to Blockchain Gaming Logic Apache Iceberg Metadata Tables: Querying the Internals Hermes, The Self-Improving Agent You Can Actually Run Yourself Unity vs Unreal: 5 Things I Had to Relearn the Hard Way Building Agentic Commerce Infrastructure: Overcoming SQLite Concurrency for Autonomous Procurement Agents Solana Accounts vs Databases HTML Table Borders I built a skill that makes AI-generated AWS diagrams actually usable My first post! I'm kinda excited The Page Root Was the Wrong Unit How to audit what your IDE extension actually sends to the cloud I Migrated 23 Make.com Scenarios to n8n and Cut My Bill by 60% — Complete Migration Guide (2026) Solving a Logistics Problem Using Genetic Algorithms Claude Code Skills Explained: What They Are & When to Use Them (2026) Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup Zero-Idle Local LLMs: Running Llama 3 in AWS Lambda Containers We scanned 8 B2B SaaS companies across 5 categories. ChatGPT named the same 12 brands in every answer. How To "Market" Yourself As A Tech Pro We scanned 500 MCP servers on Smithery. Here is what we found. HTML Basics for Beginners – Markup Language, Elements and Types of CSS DiffWhisperer: How I Turned Cryptic Git Diffs into Architectural Stories with Gemma 4 I built a version manager for llama.cpp using nothing but vibe coding. Unit Testing vs System Testing: Key Differences, Use Cases, and Best Practices for 2026 A game design textbook explains why products with fewer features win How to Build a Raydium Launchpad Bonding Curve in 5 Minutes with forgekit How to turn an AI prototype into a production system How Data Lake Table Storage Degrades Over Time Partition and Sort Keys on DynamoDB: Modeling data for batch-and-stream convergence Auto-Generate Optimized GitHub Actions Workflows For Any Stack With This New CLI Tool Unchaining the African Creator Economy The Treasure Hunt Engine Gotcha - A Lesson in Constrained Performance
Running Flux Schnell (12B) + LLMs on a Legacy AMD RX 580 (8GB) via Native Vulkan — Full Architecture Guide [2026]
AIVisionsLab · 2026-05-23 · via DEV Community
Cover image for Running Flux Schnell (12B) + LLMs on a Legacy AMD RX 580 (8GB) via Native Vulkan — Full Architecture Guide [2026]

AIVisionsLab

Most people were told the RX 580 was dead for AI in 2026. CUDA-only ecosystems, ROCm dropping Polaris support at v5.x, DirectML abandoned before it matured. This is the full technical breakdown of how we proved that wrong.

Hardware Setup

  • GPU: AMD RX 580 2048SP — 8GB GDDR5 VRAM (Vulkan 1.x native)
  • CPU: Intel Xeon E5-2690 v3 — 12c/24t @ 3.5GHz boost
  • RAM: 32GB DDR4 REG ECC Quad Channel
  • Storage: NVMe 1TB — critical bottleneck fix
  • OS: Windows 10 Pro + WSL2 Ubuntu 22.04.5

Why everything else failed

Solution Status Reason
CUDA Nvidia-only
ROCm Dropped Polaris at v5.x
DirectML OpaqueTensorImpl crash on CLIPTextEncode
OpenVINO ldm/sgm modules missing on Forge

DirectML's fatal error:

NotImplementedError: Cannot access storage of OpaqueTensorImpl

Enter fullscreen mode Exit fullscreen mode

The driver wraps memory in opaque tensors that ComfyUI's attention backends can't read. It's a dead end.

The Solution — Dual Architecture

PATH 1 — GPU Vulkan (RX 580 acceleration)

Native build of stable-diffusion.cpp compiled with -DGGML_VULKAN=ON. The ggml engine maps directly to the GPU without ROCm or CUDA. SD 1.5 GGUF models render in ~72 seconds.

PATH 2 — CPU Xeon (SOTA heavy models)

FLUX.1 Schnell at 16GB exceeds physical VRAM. ComfyUI runs via CPU inside WSL2, using ECC RAM as stable virtual VRAM. Full 768x768 generation in ~24 minutes.

Hybrid Memory Segmentation for Flux (12B Q4_K)

Component File Allocation Size
Diffusion Model flux1-schnell-q4_k.gguf GPU VRAM ~6.5GB
VAE ae.safetensors CPU RAM ~160MB
CLIP L clip_l.safetensors GPU VRAM ~235MB
T5XXL t5xxl_fp16.safetensors CPU RAM ~9.3GB

Production command

sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 \
  --diffusion-model "E:\models\flux1-schnell-q4_k.gguf" \
  --vae "E:\models\ae.safetensors" \
  --clip_l "E:\models\clip_l.safetensors" \
  --t5xxl "E:\models\t5xxl_fp16.safetensors" \
  --cfg-scale 1.0 --steps 4 --clip-on-cpu --vae-on-cpu --vae-tiling

Enter fullscreen mode Exit fullscreen mode

--vae-on-cpu + --vae-tiling are non-negotiable. Without them: instant DeviceMemoryAllocation crash.

Real Benchmarks

Workload Backend Result
LLM text inference CPU only 3–5 tokens/s ❌
LLM text inference RX 580 Vulkan 15–16 tokens/s ✅
SD 1.5 20 steps DirectML ~450s + crash ❌
SD 1.5 20 steps Vulkan native ~72s ✅
Flux 1024x1024 Xeon CPU WSL2 ~24 min ✅

NVMe impact: Model load time dropped from 25 minutes (HDD) to 4 minutes (NVMe). For Flux 16GB: from 25 min to ~30 seconds. Storage is as critical as compute.

Service Map

OpenWebUI Docker :3000
  ├── llama-server.exe :8081  (Vulkan — RX 580)
  ├── sd-server.exe    :7860  (Vulkan — RX 580)
  └── ComfyUI          :8188  (CPU — Xeon WSL2)

Enter fullscreen mode Exit fullscreen mode

Resources

Full documentation, .bat orchestration scripts, compiled binaries and model configs:
👉 https://setup-ia-local-rx580-vulkan.firebaseapp.com/


Hardware doesn't die. It gets liberated by the right software. Are you running legacy AMD cards? Let's discuss your buffer allocation and command queue latency findings in the comments.