
























Abstract:We analytically solve the Mountain Car problem, a canonical benchmark in RL, and derive an optimal control solution, closing a gap after 36 years. This enables us to reveal two surprising insights: The optimal control is quite simple, yet modern RL agents display a large gap to optimality. Motivated by the analysis of the optimal control, we introduce Chebyshev policies as a universal (i.e. dense) class of RL policies from first principles. They can be trained as drop-in replacements of neural nets, reducing the regret by a factor of 4.18, while requiring 277 times fewer parameters, fostering sample efficiency, explainability and realtime capability. Chebyshev policies are evaluated on further RL tasks, including a real-world nonlinear motion control testbed. They consistently improve performance over neural nets with PPO, ARS and REINFORCE. Our results demonstrate how Chebyshev policies offer a compelling and lightweight alternative or addition to neural nets for low-dimensional control tasks.
| Comments: | ICML 2026 Spotlight Paper |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.22305 [cs.LG] |
| (or arXiv:2605.22305v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.22305 arXiv-issued DOI via DataCite (pending registration) |
From: Hannes Unger [view email]
[v1]
Thu, 21 May 2026 10:54:26 UTC (779 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。