





















Abstract:We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including linear, Bernoulli, and binomial rewards. Existing approaches are predominantly based on maximum-likelihood estimation (MLE), using sliding-window, restart, or discounting mechanisms to handle nonstationarity. Although these methods achieve statistically efficient regret guarantees, they generally require revisiting past observations at every round, which leads to computation and memory costs that grow with time; moreover, several of them rely on a non-convex projection step. In this paper, we propose DOMD-GLB, a new algorithm for nonstationary GLBs that utilizes discounted online mirror descent (DOMD) for parameter estimation, thereby incurring only $O(1)$ computation and memory costs per round. We prove dynamic regret bounds of order $\tilde{O} \big(c_\mu^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$ in drifting environments and $\tilde{O}\big(c_\mu^{-1/3} d^{2/3} \Gamma_T^{1/3} T^{2/3}\big) $in piecewise-stationary environments, where $d$ denotes the feature dimension, $T$ the time horizon, $P_T$ the path length, $\Gamma_T$ the number of change points, and $c_\mu$ a curvature parameter associated with the link function, while substantially improving computational efficiency over prior work. To the best of our knowledge, this is the first algorithm for nonstationary GLBs with per-round computation and memory costs independent of time.
| Subjects: | Machine Learning (stat.ML); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.25590 [stat.ML] |
| (or arXiv:2605.25590v1 [stat.ML] for this version) | |
| https://doi.org/10.48550/arXiv.2605.25590 arXiv-issued DOI via DataCite (pending registration) |
From: Joongkyu Lee [view email]
[v1]
Mon, 25 May 2026 08:40:32 UTC (3,621 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。