Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
Jing Huang,
·
2026-06-02
·
via Goodfire Research
We take for granted that larger models are better than smaller ones, but why is this so? We trace this to a d…
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。