惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

A
Arctic Wolf
V
V2EX
P
Proofpoint News Feed
The Hacker News
The Hacker News
GbyAI
GbyAI
G
Google Developers Blog
S
Schneier on Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
W
WeLiveSecurity
Security Archives - TechRepublic
Security Archives - TechRepublic
博客园 - Franky
Recent Announcements
Recent Announcements
腾讯CDC
Hacker News - Newest:
Hacker News - Newest: "LLM"
K
Kaspersky official blog
U
Unit 42
Engineering at Meta
Engineering at Meta
J
Java Code Geeks
Google Online Security Blog
Google Online Security Blog
Last Week in AI
Last Week in AI
V
Vulnerabilities – Threatpost
N
News and Events Feed by Topic
O
OpenAI News
量子位
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Y
Y Combinator Blog
博客园 - 【当耐特】
Vercel News
Vercel News
Hacker News: Ask HN
Hacker News: Ask HN
T
Tor Project blog
Apple Machine Learning Research
Apple Machine Learning Research
Microsoft Security Blog
Microsoft Security Blog
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
AWS News Blog
AWS News Blog
MongoDB | Blog
MongoDB | Blog
S
Security Affairs
A
About on SuperTechFans
Project Zero
Project Zero
D
Darknet – Hacking Tools, Hacker News & Cyber Security
博客园 - 聂微东
Webroot Blog
Webroot Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Cloudbric
Cloudbric
T
Tenable Blog
月光博客
月光博客
C
Check Point Blog
宝玉的分享
宝玉的分享
V
Visual Studio Blog
T
The Blog of Author Tim Ferriss
NISL@THU
NISL@THU

stat updates on arXiv.org

Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift Any-Dimensional Invariant Universality Symbolic Density Estimation for Discrete Distributions Partial Fusion of Neural Networks: Efficient Tradeoffs Between Ensembles and Weight Aggregation When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning Anytime Training with Schedule-Free Spectral Optimization HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models Proxy-Based Approximation of Shapley and Banzhaf Interactions The ASE-LSE Disagreement Landscape: An End-to-End Characterisation of Extremes and Structural Drivers Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support The General Theory of Localization Methods Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots GradPower: Powering Gradients for Faster Language Model Pre-Training Learning-to-Defer in Non-Stationary Time Series via Switching State-Space Models Adversarial Robustness in One-Stage Learning-to-Defer Learning-to-Defer with Expert-Conditional Advice Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos Variance Reduction for Expectations with Diffusion Teachers Scalable Reinforcement Learning via Adaptive Batch Scaling TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity Latent Laplace Diffusion for Irregular Multivariate Time Series Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety Sample efficient inductive matrix completion with noise and inexact side information Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models Improved Baselines with Representation Autoencoders Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers TabPFN-3: Technical Report BOOST: A Data-Driven Framework for the Automated Joint Selection of Kernel and Acquisition Functions in Bayesian Optimization Reasoning Models Don't Just Think Longer, They Move Differently Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks Coreset-Induced Conditional Velocity Flow Matching Adaptive Kernel Density Estimation with Pre-training Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models Towards a holistic understanding of Selection Bias for Causal Effect Identification Adaptive Calibration in Non-Stationary Environments A Composite Activation Function for Learning Stable Binary Representations Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions Online Learning-to-Defer with Varying Experts Neural ARFIMA model for forecasting BRIC exchange rates with long memory RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification One-Step Generative Modeling via Wasserstein Gradient Flows Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage Modulated learning for private and distributed regression with just a single sample per client device A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors Order-Agnostic Autoregressive Modelling with Missing Data Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling Query-efficient model evaluation using cached responses Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift Spherical Flows for Sampling Categorical Data Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics Understanding Self-Supervised Learning via Latent Distribution Matching Imbalanced Classification under Capacity Constraints Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation Efficient Preference Poisoning Attack on Offline RLHF Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction Adaptive Querying with AI Persona Priors Electricity price forecasting across Norway's five bidding zones in the post-crisis era Adversarial Robustness of NTK Neural Networks Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces Inference of Online Newton Methods with Nesterov's Accelerated Sketching Conditional Score-Based Modeling of Effective Langevin Dynamics ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation Post-Training Augmentation Invariance S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version Rare Event Analysis via Stochastic Optimal Control Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression Generative Augmented Inference Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer Feature Learning Dynamics in Infinite-Depth Neural Networks Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints Program Evaluation with Remotely Sensed Outcomes A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing Dataset-Driven Channel Masks in Transformers for Multivariate Time Series Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees
Improving Linear Regression on Small Datasets via Gaussian Process and Extreme Value Theory-Based Data Augmentation
[Submitted on 16 Jun 2026] · 2026-06-17 · via stat updates on arXiv.org

View PDF HTML (experimental)

Abstract:Small sample sizes pose significant challenges in regression analysis, often leading to violations of classical assumptions such as normality, homoscedasticity, and independence of residuals. These violations compromise parameter estimation accuracy, reduce statistical power, and limit the generalizability of findings. This study introduces the Gaussian Process-based Modified Extreme Value Theorem (GP-MEVT) method, a novel hybrid data augmentation approach that combines Gaussian Process with Extreme Value Theory to address these limitations. The GP-MEVT method generates augmented observations that extend the predictor space beyond the observed range while preserving the underlying linear structure and introducing controlled variability based on residual variation, through comprehensive simulation studies across three variance scenarios (sigma = 2, 5, 8) and sample sizes (n = 10, 15, 20). Here, we demonstrate that GP-MEVT achieves a higher rate of assumption satisfaction, substantially outperforming standard bootstrap and bootstrap with noise methods. The proposed method also exhibits reasonable parameter estimation accuracy, with intercept and slope estimates consistently closer to true parameter values, and maintains competitive or superior model fitting performance as measured by root mean square error. Application to a real-world dataset confirms these advantages, with GP-MEVT achieving a 67.1% assumption satisfaction rate compared to 17.3% and 21.2% for bootstrap alternatives. These findings establish GP-MEVT as a robust and reliable framework for fitting linear regression models to small datasets, offering practitioners a principled approach to statistical inference when sample size limitations are unavoidable.

Submission history

From: Jagath Senarathne [view email]
[v1] Tue, 16 Jun 2026 03:57:11 UTC (389 KB)