

















Abstract:For first-order optimization of non-convex functions with Lipschitz continuous gradient and Hessian, the best known complexity for reaching an $\varepsilon$-approximation of a stationary point is $\mathcal{O}(\varepsilon^{-7/4})$. Existing algorithms achieving this bound are based on momentum, but are always complemented with safeguard mechanisms that erase the accumulated momentum if a certain condition is violated. Whether such resetting-momentum mechanisms are fundamentally necessary has remained an open question. We show that randomizing the parameters enable to achieve this complexity with high probability when using momentum, without any of such mechanisms. From an analysis perspective, we do so leveraging the continuized method, that interprets the algorithm as a realization of a continuous-time stochastic differential equation involving a Poisson process.
| Subjects: | Optimization and Control (math.OC) |
| Cite as: | arXiv:2602.05504 [math.OC] |
| (or arXiv:2602.05504v3 [math.OC] for this version) | |
| https://doi.org/10.48550/arXiv.2602.05504 arXiv-issued DOI via DataCite |
From: Julien Hermant [view email]
[v1]
Thu, 5 Feb 2026 10:05:47 UTC (269 KB)
[v2]
Sat, 23 May 2026 14:15:59 UTC (63 KB)
[v3]
Tue, 26 May 2026 06:18:19 UTC (63 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。