惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Help Net Security
Help Net Security
G
Google Developers Blog
雷峰网
雷峰网
WordPress大学
WordPress大学
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Engineering at Meta
Engineering at Meta
Security Latest
Security Latest
T
Threat Research - Cisco Blogs
AWS News Blog
AWS News Blog
F
Full Disclosure
C
Cybersecurity and Infrastructure Security Agency CISA
T
The Exploit Database - CXSecurity.com
J
Java Code Geeks
U
Unit 42
C
Cyber Attacks, Cyber Crime and Cyber Security
V
V2EX
C
Cisco Blogs
博客园 - 司徒正美
Project Zero
Project Zero
L
LINUX DO - 热门话题
阮一峰的网络日志
阮一峰的网络日志
Blog — PlanetScale
Blog — PlanetScale
Scott Helme
Scott Helme
A
About on SuperTechFans
Hugging Face - Blog
Hugging Face - Blog
S
Securelist
小众软件
小众软件
aimingoo的专栏
aimingoo的专栏
S
Schneier on Security
G
GRAHAM CLULEY
酷 壳 – CoolShell
酷 壳 – CoolShell
Cyberwarzone
Cyberwarzone
MongoDB | Blog
MongoDB | Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 叶小钗
T
Threatpost
Recorded Future
Recorded Future
C
CXSECURITY Database RSS Feed - CXSecurity.com
宝玉的分享
宝玉的分享
N
News and Events Feed by Topic
人人都是产品经理
人人都是产品经理
The Register - Security
The Register - Security
S
Security Archives - TechRepublic
博客园 - Franky
N
News | PayPal Newsroom
Simon Willison's Weblog
Simon Willison's Weblog
S
SegmentFault 最新的问题
W
WeLiveSecurity
A
Arctic Wolf
B
Blog

JMLR

Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective Online Bernstein-von Mises theorem Covariate-dependent Hierarchical Dirichlet Processes DCatalyst: A Unified Accelerated Framework for Decentralized Optimization Boosted Control Functions: Distribution Generalization and Invariance in Confounded Models Contrasting Local and Global Modeling with Machine Learning and Satellite Data: A Case Study Estimating Tree Canopy Height in African Savannas A Symplectic Analysis of Alternating Mirror Descent Two-way Node Popularity Model for Directed and Bipartite Networks Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood A causal fused lasso for interpretable heterogeneous treatment effects estimation Unsupervised Feature Selection via Nonnegative Orthogonal Constrained Regularized Minimization Reparameterized Complex-valued Neurons Can Efficiently Learn More than Real-valued Neurons via Gradient Descent Hierarchical Causal Models Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection Adaptive Forward Stepwise: A Method for High Sparsity Regression Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration Persistence Diagrams Estimation of Multivariate Piecewise Hölder-continuous Signals Exploring Novel Uncertainty Quantification through Forward Intensity Function Modeling Generative Bayesian Inference with GANs Communication-efficient Distributed Statistical Inference for Massive Data with Heterogeneous Auxiliary Information Decorrelated Local Linear Estimator: Inference for Non-linear Effects in High-dimensional Additive Models Refined Risk Bounds for Unbounded Losses via Transductive Priors A Common Interface for Automatic Differentiation LazyDINO: Fast, Scalable, and Efficiently Amortized Bayesian Inversion via Structure-Exploiting and Surrogate-Driven Measure Transport The Distribution of Ridgeless Least Squares Interpolators Nonparametric Estimation of a Factorizable Density using Diffusion Models Learning Bayesian Network Classifiers to Minimize Class Variable Parameters Simulation-based Calibration of Uncertainty Intervals under Approximate Bayesian Estimation An Anytime Algorithm for Good Arm Identification Extrapolated Markov Chain Oversampling Method for Imbalanced Text Classification Neural Network Parameter-optimization of Gaussian Pre-marginalized Directed Acyclic Graphs Flexible Functional Treatment Effect Estimation Error Analysis for Deep ReLU Feedforward Density-Ratio Estimation with Bregman Divergence A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design UQLM: A Python Package for Uncertainty Quantification in Large Language Models Nonlinear function-on-function regression by RKHS Nonlocal Techniques for the Analysis of Deep ReLU Neural Network Approximations A Data-Augmented Contrastive Learning Approach to Nonparametric Density Estimation Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent skwdro: a library for Wasserstein distributionally robust machine learning Extending Mean-Field Variational Inference via Entropic Regularization: Theory and Computation Stochastic Gradient Methods: Bias, Stability and Generalization Classification Under Local Differential Privacy with Model Reversal and Model Averaging Identifying Weight-Variant Latent Causal Models Efficient frequent directions algorithms for approximate decomposition of matrices and higher-order tensors Online Detection of Changes in Moment--Based Projections: When to Retrain Deep Learners or Update Portfolios? The surrogate Gibbs-posterior of a corrected stochastic MALA: Towards uncertainty quantification for neural networks
Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width
Yunwen Lei, · 2026-01-01 · via JMLR

Yunwen Lei, Puyu Wang, Yiming Ying, Ding-Xuan Zhou; 27(34):1−35, 2026.

Abstract

Understanding the generalization and optimization of neural networks is a longstanding problem in modern learning theory. The prior analysis often leads to risk bounds of order $1/\sqrt{n}$ for ReLU networks, where $n$ is the sample size. In this paper, we present a general optimization and generalization analysis for gradient descent applied to shallow ReLU networks. We develop convergence rates of the order $1/T$ for gradient descent with $T$ iterations, and show that the gradient descent iterates fall inside local balls around either an initialization point or a reference point. Then we develop improved Rademacher complexity estimates by using the activation pattern of the ReLU function in these local balls. We apply our general result to NTK-separable data with a margin $\gamma$, and develop an almost optimal risk bound of the order $1/(n\gamma^2)$ for the ReLU network with a polylogarithmic width.

[abs][pdf][bib]