Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction

Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization

On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions

Improved Baselines with Representation Autoencoders

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions

Calibeating for general proper losses: A Bregman divergence approach

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

Sample-efficient inductive matrix completion with noise and inexact side-information

Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety

XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset: From Tree to Hybrid and Tabular DNN Ensembles

Reasoning Models Don't Just Think Longer, They Move Differently

TabPFN-3: Technical Report

Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models

Towards a holistic understanding of Selection Bias for Causal Effect Identification

Adaptive Kernel Density Estimation with Pre-training

Coreset-Induced Conditional Velocity Flow Matching

RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare

ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks

Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage

Model-based Bootstrap of Controlled Markov Chains

Online Learning-to-Defer with Varying Experts

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions

One-Step Generative Modeling via Wasserstein Gradient Flows

Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty

A Composite Activation Function for Learning Stable Binary Representations

Adaptive Calibration in Non-Stationary Environments

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage

On Variance Reduction in Learning Mean Flows

When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains

A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning

Ensemble Distributionally Robust Bayesian Optimisation

The Proxy Presumption: From Semantic Embeddings to Valid Social Measures

Modulated learning for private and distributed regression with just a single sample per client device

Query-efficient model evaluation using cached responses

Order-Agnostic Autoregressive Modelling with Missing Data

Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

Tuning Derivatives for Causal Fairness in Machine Learning

Spherical Flows for Sampling Categorical Data

Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors

Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

Graph Convolutional Support Vector Regression for Robust Spatiotemporal Forecasting of Urban Air Pollution

Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation

Understanding Self-Supervised Learning via Latent Distribution Matching

Imbalanced Classification under Capacity Constraints

On the Spectral Structure and Objective Equivalence of Orthogonal Multilabel Fisher Discriminants

Partially Observed Structural Causal Models

Robust and Fast Training via Per-Sample Clipping

Efficient Preference Poisoning Attack on Offline RLHF

Joint Energy Management and Coordinated AIGC Workload Scheduling for Distributed Data Centers: A Diffusion-Aided Reward Shaping Approach

Distributional Causal Mediation via Conditional Generative Modeling

A Theory of Saddle Escape in Deep Nonlinear Networks

Adaptive Querying with AI Persona Priors

Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

Electricity price forecasting across Norway's five bidding zones in the post-crisis era

Adversarial Robustness of NTK Neural Networks

Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families

A Limit Theory of Foundation Models: A Mathematical Approach to Understanding Emergent Intelligence and Scaling Laws

Conditional Score-Based Modeling of Effective Langevin Dynamics

Inference of Online Newton Methods with Nesterov's Accelerated Sketching

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation

Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Geometric Layer-wise Approximation Rates for Deep Networks

S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

Enhancing AI and Dynamical Subseasonal Forecasts with Probabilistic Bias Correction

Zeroth-Order Optimization at the Edge of Stability

Generative Augmented Inference

Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression

Rare Event Analysis via Stochastic Optimal Control

Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables

Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

Spatio-temporal probabilistic forecast using MMAF-guided learning

Conformal Policy Control

The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards

Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates

Constrained Policy Optimization with Cantelli-Bounded Value-at-Risk

Factorizable joint shift revisited

Feature Learning Dynamics in Infinite-Depth Neural Networks

Statistically-Guided Meta-Learning for Cross-Deployment Activity Recognition in Distributed Fiber-Optic Sensing

DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing

Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

Neural ARFIMA model for forecasting BRIC exchange rates with long memory

Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling

BOOST: A Data-Driven Framework for the Automated Joint Selection of Kernel and Acquisition Functions in Bayesian Optimization

Random Walk Learning and the Pac-Man Attack

Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models

Post-Training Augmentation Invariance

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

Dataset-Driven Channel Masks in Transformers for Multivariate Time Series

推荐订阅源

stat updates on arXiv.org