惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
T
Troy Hunt's Blog
A
Arctic Wolf
Cyberwarzone
Cyberwarzone
L
Lohrmann on Cybersecurity
Simon Willison's Weblog
Simon Willison's Weblog
The Hacker News
The Hacker News
I
Intezer
T
Tenable Blog
L
LINUX DO - 热门话题
S
Securelist
WordPress大学
WordPress大学
月光博客
月光博客
MyScale Blog
MyScale Blog
T
Tor Project blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Blog — PlanetScale
Blog — PlanetScale
C
CERT Recently Published Vulnerability Notes
C
Cisco Blogs
SecWiki News
SecWiki News
Security Latest
Security Latest
Help Net Security
Help Net Security
云风的 BLOG
云风的 BLOG
The Cloudflare Blog
博客园 - 司徒正美
S
Secure Thoughts
F
Full Disclosure
Cisco Talos Blog
Cisco Talos Blog
C
Cybersecurity and Infrastructure Security Agency CISA
www.infosecurity-magazine.com
www.infosecurity-magazine.com
P
Privacy International News Feed
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
S
Schneier on Security
T
Threatpost
Schneier on Security
Schneier on Security
小众软件
小众软件
AWS News Blog
AWS News Blog
Apple Machine Learning Research
Apple Machine Learning Research
P
Privacy & Cybersecurity Law Blog
Project Zero
Project Zero
罗磊的独立博客
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
TaoSecurity Blog
TaoSecurity Blog
Attack and Defense Labs
Attack and Defense Labs
Google Online Security Blog
Google Online Security Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
V
Visual Studio Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
博客园 - Franky
博客园 - 三生石上(FineUI控件)

math updates on arXiv.org

Coupling-Robust Accuracy in Multiphysics Physics Informed Neural Networks via Kronecker-Preconditioned Optimization Non-normal spectral signatures of instability in neural network training dynamics Optimization of randomized neural networks for transfer operator approximation Selective Ambulance Dispatch Under Contextual Travel-Time Uncertainty LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics Learning Decision-Sufficient Representations for Linear Optimization Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses Prabhakar function and unified fractional kinetic equation in bicomplex space Computing Gamma(p/q) with Beta function values Flows on Graded Manifolds Optimal embedding dimension in the Nash--Tognoli theorem An optimal first-order method for smooth and strongly convex composite optimization and its stationary limit Sharp Bohr-Type inequalities for certain classes of close-to-convex functions Invariants of real affine varieties based on their complexifications Topological symmetric and braid homologies A Formal Graph-Theoretic Framework for Pitch Class Set Analysis Finite groups with high commuting probability for Sylow subgroups Performance Bounds for Rollout Policies in Stochastic Shortest Path Problems Real 2-blocks in quasi-simple groups Maximal subalgebras of the Lie algebra $W_n(\mathbb{K})$ Cohomogeneity-One Ruled Hypersurfaces in $\mathbb{CP}^2$ and $\mathbb{C}H^2$ Global analysis of the Kuramoto flow Cartier algebras through the lens of $p$-families Positivity in the context of Hodge modules and Higgs bundles on Deligne-Mumford stacks A secondary pairing between K-theory and K-homology, relative eta invariants, and zeta maps Detecting and Correcting Sample-by-Sample Scale Distortion in RNA Sequencing Data Neural Flow Operators can Approximate any Operator: Abstract Frameworks and Universal Approximations LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy Training-Free Looped Transformers Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries Asymmetric Scaling Laws from Sparse Features Is Dimensionality a Barrier for Retrieval Models? RA-DCA: A Randomized Active-Set DCA for Directional Stationarity in Max-Structured DC Programs Commutator-Induced Uncertainty in VAEs Weisfeiler-Leman Is Incomplete on Simple Spectrum Graphs, so Canonicalize Them Sparse In-Network Learning via Shortest-Path Backpropagation and Finite-Rate Gating Generalized Stochastic Approximation of the Log-Likelihood Ratio for Robust Sequential Change-Point Detection Instance-Optimal Estimation with Multiple LLM Judges on a Budget Entropy Equivalence Testing Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation Any-Dimensional Invariant Universality Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models Anytime Training with Schedule-Free Spectral Optimization The Poisson Tail Conjecture for Primes in Short Intervals Star-Shaped Integral Cartan-Type Matrices and an Egyptian-Fraction Classification of Affine Weighted Trees A Complete Spectral Analysis of the CEV Operator with Applications to Arbitrage Symplectic lattice counting and zeta functions of higher Heisenberg groups Concise and elegant proofs of three formulas for complete Bell polynomials On Reed-Muller subcodes, Grassmannian partitions and sum-free functions Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation Resilience Characterization of AI-Native Wireless Receivers via Persistent Homology The General Theory of Localization Methods A Comprehensive Study of Clique Graphs and Clique Regular Graphs Every signed planar graph is $5$-choosable: A short proof and refinements PilotWiMAE: Pilot-Native Representation Learning for Wireless Channels Proximal basin hopping: global optimization with guarantees Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers Stochastic Non-Smooth Convex Optimization with Unbounded Gradients Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space The Geometry of Cooperative Game Solutions: Stratified Egalitarian Shapley Values An Axiomatic Theory of Tie-Breaking: Impossibility, Characterization, and Decomposition PyCSP3-Scheduling: A Scheduling Extension for PyCSP3 Strategic PAC Learnability via Geometric Definability Proximal-Based Generative Modeling for Bayesian Inverse Problems Every Minimal Counterexample to the Erdős-Gyárfás Conjecture is Predominantly Cubic SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference NOVA: Fundamental Limits of Knowledge Discovery Through AI TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection Minimal Filling Architectures of Polynomial Neural Networks: Counterexamples, Frontier Search, and Defects Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes Towards an Inferentialist Account of Information Through Proof-theoretic Semantics Random test functions, $H^{-1}$ norm equivalence, and stochastic variational physics-informed neural networks QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization Robust and Fast Training via Per-Sample Clipping Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots Deep Learning of Solver-Aware Turbulence Closures from Nudged LES Dynamics Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems Information-Theoretic Measures in AI: A Practical Decision Guide Inference of Online Newton Methods with Nesterov's Accelerated Sketching A Unified Fractional Regularization Framework for Sparse Recovery Mathematical Foundations for Peer-to-Peer Lattice Computation Geometric Layer-wise Approximation Rates for Deep Networks RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version Order-Optimal Sequential 1-Bit Mean Estimation in General Tail Regimes Training-Free Rate-Distortion-Perception Traversal With Diffusion Linear Regression with Unknown Truncation Beyond Gaussian Features ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport Feature Learning Dynamics in Infinite-Depth Neural Networks ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms Normalizing Flows on Quotient Manifolds via Boundary Quotients TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis Program Evaluation with Remotely Sensed Outcomes Efficient Gradient Estimation for Parameterized Quantum Systems with Lie Algebraic Symmetries
Local linear convergence of gradient methods for overparameterized Gaussian mixtures
Jingxing Wang, Vasileios Charisopoulos, Maryam Fazel · 2026-05-29 · via math updates on arXiv.org

We study the problem of learning Gaussian mixture models under overparameterization. Prior work has shown that while overparameterization is essential for avoiding spurious local optima and enables global recovery of the ground-truth model using the gradient-EM (expectation-maximization) algorithm, it can dramatically slow down the local rate of convergence. Under certain assumptions on the mixture weights, we show that a standard divergence measure minimized by statistical learning procedures possesses a manifold of slow growth on which the well-known Polyak stepsize reduces the loss geometrically, and design a gradient-based method that converges to minimizers at a locally linear rate. Additionally, we show that our method converges to nearly optimal solutions -- up to a natural misspecification threshold -- for mixtures with arbitrary weights. At a high level, the method alternates between several "short" gradient descent steps that approach the manifold and "long" Polyak steps that contract the distance to minimizers. Our results suggest that slow convergence is not an intrinsic challenge of overparameterization, but can be overcome by exploiting the favorable structure of the loss landscape.