惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
V
V2EX
WordPress大学
WordPress大学
Microsoft Azure Blog
Microsoft Azure Blog
博客园 - 司徒正美
Jina AI
Jina AI
J
Java Code Geeks
酷 壳 – CoolShell
酷 壳 – CoolShell
MyScale Blog
MyScale Blog
云风的 BLOG
云风的 BLOG
B
Blog
The GitHub Blog
The GitHub Blog
Recorded Future
Recorded Future
人人都是产品经理
人人都是产品经理
IT之家
IT之家
宝玉的分享
宝玉的分享
MongoDB | Blog
MongoDB | Blog
雷峰网
雷峰网
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
美团技术团队
G
Google Developers Blog
Scott Helme
Scott Helme
L
LINUX DO - 热门话题
Application and Cybersecurity Blog
Application and Cybersecurity Blog
U
Unit 42
Y
Y Combinator Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Threat Research - Cisco Blogs
S
Security Affairs
博客园 - 叶小钗
博客园_首页
Microsoft Security Blog
Microsoft Security Blog
S
Security @ Cisco Blogs
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
Cyberwarzone
Cyberwarzone
V
Visual Studio Blog
D
Docker
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
H
Help Net Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recent Announcements
Recent Announcements
S
Security Archives - TechRepublic
Google DeepMind News
Google DeepMind News
Last Week in AI
Last Week in AI
T
The Blog of Author Tim Ferriss
大猫的无限游戏
大猫的无限游戏

stat updates on arXiv.org

HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning Anytime Training with Schedule-Free Spectral Optimization Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift Any-Dimensional Invariant Universality Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis Instance-Optimal Estimation with Multiple LLM Judges on a Budget Optimal Dimension-Free Sampling for Regularized Classification Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries Training-Free Looped Transformers Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation Uncertainty-aware classification and triage of structural heart disease using electrocardiography and echocardiography metrics LLM Sparsity Prior for Robust Feature Selection Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models Entropy Equivalence Testing Coupled Training with Privileged Information and Unlabeled Data Asymmetric Scaling Laws from Sparse Features Dirichlet-Based Monte Carlo Dropout for Uncertainty Estimation in Neural Networks Learning Kernel-Based MDPs from Episodic Preferential Feedback Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy Causal Additive Models with Unobserved Causal Paths and Backdoor Paths Are Targeted Data Poisoning Attacks as Effective as We Think? Near-Optimal Private Linear Regression via Iterative Hessian Mixing Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos Symbolic Density Estimation for Discrete Distributions Partial Fusion of Neural Networks: Efficient Tradeoffs Between Ensembles and Weight Aggregation Proxy-Based Approximation of Shapley and Banzhaf Interactions Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning The ASE-LSE Disagreement Landscape: An End-to-End Characterisation of Extremes and Structural Drivers A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support The General Theory of Localization Methods Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification Variance Reduction for Expectations with Diffusion Teachers CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots Learning-to-Defer in Non-Stationary Time Series via Switching State-Space Models TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees Adversarial Robustness in One-Stage Learning-to-Defer Learning-to-Defer with Expert-Conditional Advice Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models Latent Laplace Diffusion for Irregular Multivariate Time Series Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad Reasoning Models Don't Just Think Longer, They Move Differently TabPFN-3: Technical Report BOOST: A Data-Driven Framework for the Automated Joint Selection of Kernel and Acquisition Functions in Bayesian Optimization RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare Towards a holistic understanding of Selection Bias for Causal Effect Identification Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks Coreset-Induced Conditional Velocity Flow Matching Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models Feature Learning Dynamics in Infinite-Depth Neural Networks Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty One-Step Generative Modeling via Wasserstein Gradient Flows Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification Adaptive Calibration in Non-Stationary Environments Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions Online Learning-to-Defer with Varying Experts When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage Query-efficient model evaluation using cached responses Modulated learning for private and distributed regression with just a single sample per client device A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning Spherical Flows for Sampling Categorical Data Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors Order-Agnostic Autoregressive Modelling with Missing Data Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling Dataset-Driven Channel Masks in Transformers for Multivariate Time Series Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift Understanding Self-Supervised Learning via Latent Distribution Matching Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning Efficient Preference Poisoning Attack on Offline RLHF Adaptive Querying with AI Persona Priors Electricity price forecasting across Norway's five bidding zones in the post-crisis era Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation Inference of Online Newton Methods with Nesterov's Accelerated Sketching Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection Generative Augmented Inference Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer
Data augmented bootstrap: Unifying confidence interval construction by approximate invariance
[Submitted on 8 Jun 2026] · 2026-06-09 · via stat updates on arXiv.org

Statistics > Methodology

arXiv:2606.09049 (stat)

[Submitted on 8 Jun 2026]

View PDF

Abstract:We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.

Submission history

From: Kevin Han Huang [view email]
[v1] Mon, 8 Jun 2026 05:39:02 UTC (969 KB)

Current browse context:

stat.ME

Bookmark

BibSonomy Reddit

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Code, Data, Media

Code, Data and Media Associated with this Article

Demos

Demos

Related Papers

Recommenders and Search Tools

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.