Computer Science > Machine Learning
arXiv:2606.15268 (cs)
[Submitted on 13 Jun 2026]
Abstract:Schatten-$\infty$ based optimizers such as Muon have shown promising empirical performance, but there remains seemingly conflicting observations regarding whether they are beneficial. We resolve this conflict by showing that the conclusion is regime dependent. Even when the objective is smooth in the Schatten-$\infty$ geometry, smaller Schatten-$p$ geometries can be optimal, specifically in the low-dimensional regime, which we show includes Chinchilla scaling. This conclusion follows from a new noise-robust acceleration result for the SODA framework for $p>2$. The same analysis explains why Muon-like methods do not require warmup, why they naturally favor large batches, and yields a batch size scaling rule for arbitrary $p$.
Submission history
From: Thomas Pethick [view email]
[v1]
Sat, 13 Jun 2026 12:02:18 UTC (755 KB)
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Code, Data, Media
Code, Data and Media Associated with this Article
Demos
Demos
Related Papers
Recommenders and Search Tools
IArxiv recommender toggle
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.























