






















I wanted to know how fast a 26B mixture-of-experts model could run on a desktop CPU with no GPU. Got ~40 tok/s single-stream (lossless) and ~124 batched. The surprising part was the byte budget: for this model you compress the output head (32% of per-token bytes), not the experts (16%). The writeup has the bandwidth roofline and the dead-ends; the repo has the reproducible recipe. Happy to answer questions.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。