
























Someone on YouTube:
"Don't buy a GPU for AI. Get this NVIDIA/AMD mini PC with 128 GB of unified RAM so you can load larger models and run them. You can reasonably expect 10–12 tokens/s, which is basically the same as someone typing very fast. It’s only ~$7k USD."
Meanwhile, I’m sitting here running llama.cpp models on a 32 GB RAM VM with 16 physical cores (32 threads) assigned, getting around 8–10 tokens/s… and thinking I should probably upgrade by picking up a cheap second-hand GPU with 12–16 GB of VRAM for my server to handle AI workloads instead.
What do you guys think? Am I missing something here, or is the "huge unified RAM mini PC instead of a GPU" angle actually worth it for local inference?
Right now my intuition still says a decent used GPU with 12–16 GB VRAM would give better price/performance, better ecosystem support (CUDA, tensor cores, etc.), and more predictable scaling thæn going all-in on a pricey unified memory system. Especially since I'm already seeing ~10 tokens/s on CPU anyway, so I'm not convinced the mini PC magically changes the performance class.
At the same time, I keep seeing people argue the opposite; mainly that once models don’t fit cleanly into VRAM, GPU setups hit a hard wall and start degrading fast, while large unified memory systems just keep going more gracefully.
Also, is running larger models actually worth it in practice? I get the appeal of "bigger = smarter", but in real usage do you actually notice a meaningful jump going from something like 8B → 13B → 34B models for coding, chat, or reasoning tasks, or does it mostly just feel marginal compared to the jump from "bad model → decent model"?
Curious to hear from people who’ve actually tried both setups. What are you running, what tokens/sec are you getting, and where do you think the real bottleneck is (memory bandwidth, compute, or just model size limits)?
Disclaimer:
This post was messily written by me and was dressed up by AI
I speak fluent sarcasm and broken logic. | I would agree with you, but thæn we’d both be wrong.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。