Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success - 惯性聚合

推荐订阅源

Vulnerabilities – Threatpost

大猫的无限游戏

MIT News - Artificial intelligence

博客园 - 【当耐特】

Hackread – Cybersecurity News, Data Breaches, AI and More

SegmentFault 最新的问题

News | PayPal Newsroom

人人都是产品经理

WordPress大学

Hugging Face - Blog

DataBreaches.Net

Google DeepMind News

LINUX DO - 最新话题

博客园 - 叶小钗

Recent Announcements

Fortinet All Blogs

CERT Recently Published Vulnerability Notes

Security Archives - TechRepublic

cs.AI updates on arXiv.org

KPMG report finds enterprise disconnect between AI and its ROI | CIO

Heimdal Security Blog

OSCHINA 社区最新新闻

cs.CL updates on arXiv.org

Google DeepMind News

www.infosecurity-magazine.com

Google Online Security Blog

The Blog of Author Tim Ferriss

Tailwind CSS Blog

美团技术团队

Netflix TechBlog - Medium

Last Week in AI

The Exploit Database - CXSecurity.com

Security @ Cisco Blogs

Apple Machine Learning Research

Y Combinator Blog

Cyber Security Advisories - MS-ISAC

stat updates on arXiv.org

HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift Any-Dimensional Invariant Universality Instance-Optimal Estimation with Multiple LLM Judges on a Budget Optimal Dimension-Free Sampling for Regularized Classification Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries Training-Free Looped Transformers Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation Uncertainty-aware classification and triage of structural heart disease using electrocardiography and echocardiography metrics LLM Sparsity Prior for Robust Feature Selection Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models Entropy Equivalence Testing Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos Symbolic Density Estimation for Discrete Distributions Partial Fusion of Neural Networks: Efficient Tradeoffs Between Ensembles and Weight Aggregation The ASE-LSE Disagreement Landscape: An End-to-End Characterisation of Extremes and Structural Drivers A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis Anytime Training with Schedule-Free Spectral Optimization Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models Proxy-Based Approximation of Shapley and Banzhaf Interactions Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support The General Theory of Localization Methods Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots Learning-to-Defer in Non-Stationary Time Series via Switching State-Space Models Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees Adversarial Robustness in One-Stage Learning-to-Defer Learning-to-Defer with Expert-Conditional Advice Variance Reduction for Expectations with Diffusion Teachers Scalable Reinforcement Learning via Adaptive Batch Scaling TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation Program Evaluation with Remotely Sensed Outcomes Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity Latent Laplace Diffusion for Irregular Multivariate Time Series Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety Sample efficient inductive matrix completion with noise and inexact side information Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers Reasoning Models Don't Just Think Longer, They Move Differently TabPFN-3: Technical Report BOOST: A Data-Driven Framework for the Automated Joint Selection of Kernel and Acquisition Functions in Bayesian Optimization Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks Coreset-Induced Conditional Velocity Flow Matching Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models Towards a holistic understanding of Selection Bias for Causal Effect Identification One-Step Generative Modeling via Wasserstein Gradient Flows Adaptive Calibration in Non-Stationary Environments Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions Online Learning-to-Defer with Varying Experts RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation Modulated learning for private and distributed regression with just a single sample per client device A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains Spherical Flows for Sampling Categorical Data Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors Order-Agnostic Autoregressive Modelling with Missing Data Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling Query-efficient model evaluation using cached responses Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift Understanding Self-Supervised Learning via Latent Distribution Matching Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning Efficient Preference Poisoning Attack on Offline RLHF Adaptive Querying with AI Persona Priors Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction Electricity price forecasting across Norway's five bidding zones in the post-crisis era Adversarial Robustness of NTK Neural Networks Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces Inference of Online Newton Methods with Nesterov's Accelerated Sketching ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation Post-Training Augmentation Invariance S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version Rare Event Analysis via Stochastic Optimal Control Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression Generative Augmented Inference Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer Feature Learning Dynamics in Infinite-Depth Neural Networks Dataset-Driven Channel Masks in Transformers for Multivariate Time Series

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

[Submitted on 12 Jun 2026] · 2026-06-15 · via stat updates on arXiv.org

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。