





















Abstract:Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. For example, learning on language data typically leads to heavy-tailed gradient. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noise. Specifically, we develop a normalized stochastic variance-reduced bilevel gradient descent algorithm, which does not rely on any clipping operation. Moreover, we establish its convergence rate by innovatively bounding interdependent gradient sequences under heavy-tailed noise for nonconvex decentralized bilevel optimization problems. As far as we know, this is the first decentralized bilevel optimization algorithm with rigorous theoretical guarantees under heavy-tailed noise. The extensive experimental results confirm the effectiveness of our algorithm in handling heavy-tailed noise.
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2509.15543 [cs.LG] |
| (or arXiv:2509.15543v2 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2509.15543 arXiv-issued DOI via DataCite |
From: Hongchang Gao [view email]
[v1]
Fri, 19 Sep 2025 02:51:19 UTC (407 KB)
[v2]
Mon, 25 May 2026 16:32:52 UTC (2,551 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。