惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Help Net Security
Help Net Security
G
Google Developers Blog
雷峰网
雷峰网
WordPress大学
WordPress大学
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Engineering at Meta
Engineering at Meta
Security Latest
Security Latest
T
Threat Research - Cisco Blogs
AWS News Blog
AWS News Blog
F
Full Disclosure
C
Cybersecurity and Infrastructure Security Agency CISA
T
The Exploit Database - CXSecurity.com
J
Java Code Geeks
U
Unit 42
C
Cyber Attacks, Cyber Crime and Cyber Security
V
V2EX
C
Cisco Blogs
博客园 - 司徒正美
Project Zero
Project Zero
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
Blog — PlanetScale
Blog — PlanetScale
Scott Helme
Scott Helme
A
About on SuperTechFans
Hugging Face - Blog
Hugging Face - Blog
S
Securelist
小众软件
小众软件
aimingoo的专栏
aimingoo的专栏
S
Schneier on Security
G
GRAHAM CLULEY
酷 壳 – CoolShell
酷 壳 – CoolShell
Cyberwarzone
Cyberwarzone
MongoDB | Blog
MongoDB | Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 叶小钗
T
Threatpost
Recorded Future
Recorded Future
C
CXSECURITY Database RSS Feed - CXSecurity.com
宝玉的分享
宝玉的分享
N
News and Events Feed by Topic
人人都是产品经理
人人都是产品经理
The Register - Security
The Register - Security
S
Security Archives - TechRepublic
博客园 - Franky
N
News | PayPal Newsroom
Simon Willison's Weblog
Simon Willison's Weblog
S
SegmentFault 最新的问题
W
WeLiveSecurity
A
Arctic Wolf
B
Blog

JMLR

Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective Online Bernstein-von Mises theorem Covariate-dependent Hierarchical Dirichlet Processes DCatalyst: A Unified Accelerated Framework for Decentralized Optimization Boosted Control Functions: Distribution Generalization and Invariance in Confounded Models Contrasting Local and Global Modeling with Machine Learning and Satellite Data: A Case Study Estimating Tree Canopy Height in African Savannas A Symplectic Analysis of Alternating Mirror Descent Two-way Node Popularity Model for Directed and Bipartite Networks Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood A causal fused lasso for interpretable heterogeneous treatment effects estimation Unsupervised Feature Selection via Nonnegative Orthogonal Constrained Regularized Minimization Reparameterized Complex-valued Neurons Can Efficiently Learn More than Real-valued Neurons via Gradient Descent Hierarchical Causal Models Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection Adaptive Forward Stepwise: A Method for High Sparsity Regression Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration Persistence Diagrams Estimation of Multivariate Piecewise Hölder-continuous Signals Exploring Novel Uncertainty Quantification through Forward Intensity Function Modeling Generative Bayesian Inference with GANs Communication-efficient Distributed Statistical Inference for Massive Data with Heterogeneous Auxiliary Information Decorrelated Local Linear Estimator: Inference for Non-linear Effects in High-dimensional Additive Models Refined Risk Bounds for Unbounded Losses via Transductive Priors A Common Interface for Automatic Differentiation LazyDINO: Fast, Scalable, and Efficiently Amortized Bayesian Inversion via Structure-Exploiting and Surrogate-Driven Measure Transport The Distribution of Ridgeless Least Squares Interpolators Nonparametric Estimation of a Factorizable Density using Diffusion Models Learning Bayesian Network Classifiers to Minimize Class Variable Parameters Simulation-based Calibration of Uncertainty Intervals under Approximate Bayesian Estimation An Anytime Algorithm for Good Arm Identification Extrapolated Markov Chain Oversampling Method for Imbalanced Text Classification Neural Network Parameter-optimization of Gaussian Pre-marginalized Directed Acyclic Graphs Flexible Functional Treatment Effect Estimation Error Analysis for Deep ReLU Feedforward Density-Ratio Estimation with Bregman Divergence A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design UQLM: A Python Package for Uncertainty Quantification in Large Language Models Nonlinear function-on-function regression by RKHS Nonlocal Techniques for the Analysis of Deep ReLU Neural Network Approximations A Data-Augmented Contrastive Learning Approach to Nonparametric Density Estimation Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent skwdro: a library for Wasserstein distributionally robust machine learning Extending Mean-Field Variational Inference via Entropic Regularization: Theory and Computation Classification Under Local Differential Privacy with Model Reversal and Model Averaging Identifying Weight-Variant Latent Causal Models Efficient frequent directions algorithms for approximate decomposition of matrices and higher-order tensors Online Detection of Changes in Moment--Based Projections: When to Retrain Deep Learners or Update Portfolios? The surrogate Gibbs-posterior of a corrected stochastic MALA: Towards uncertainty quantification for neural networks
Stochastic Gradient Methods: Bias, Stability and Generalization
Shuang Zeng, · 2026-01-01 · via JMLR

Shuang Zeng, Yunwen Lei; 27(6):1−55, 2026.

Abstract

Recent developments of stochastic optimization often suggest biased gradient estimators to improve either the robustness, communication efficiency or computational speed. Representative biased stochastic gradient methods (BSGMs) include Zeroth-order stochastic gradient descent (SGD), Clipped-SGD and SGD with delayed gradients. The practical success of BSGMs motivates a lot of convergence analysis to explain their impressive training behaviour. As a comparison, there is far less work on their generalization analysis, which is a central topic in modern machine learning. In this paper, we present the first framework to study the stability and generalization of BSGMs for convex and smooth problems. We introduce a generalized Lipschitz-type condition on gradient estimators and bias, under which we develop a rather general stability bound to show how the bias and the gradient estimators affect the stability. We apply our general result to develop the first stability bound for Zeroth-order SGD with reasonable step size sequences, and the first stability bound for Clipped-SGD. While our stability analysis is developed for general BSGMs, the resulting stability bounds for both Zeroth-order SGD and Clipped-SGD match those of SGD under appropriate smoothing/clipping parameters. We combine the stability and convergence analysis together, and derive excess risk bounds of order $O(1/\sqrt{n})$ for both Zeroth-order SGD and Clipped-SGD, where $n$ is the sample size.

[abs][pdf][bib]