惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V
Visual Studio Blog
MongoDB | Blog
MongoDB | Blog
Engineering at Meta
Engineering at Meta
云风的 BLOG
云风的 BLOG
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
T
The Exploit Database - CXSecurity.com
P
Privacy & Cybersecurity Law Blog
Know Your Adversary
Know Your Adversary
月光博客
月光博客
I
InfoQ
阮一峰的网络日志
阮一峰的网络日志
NISL@THU
NISL@THU
爱范儿
爱范儿
S
Securelist
博客园 - 叶小钗
C
CERT Recently Published Vulnerability Notes
Recorded Future
Recorded Future
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
aimingoo的专栏
aimingoo的专栏
D
DataBreaches.Net
G
GRAHAM CLULEY
P
Proofpoint News Feed
A
About on SuperTechFans
Google DeepMind News
Google DeepMind News
C
Cyber Attacks, Cyber Crime and Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Tor Project blog
Stack Overflow Blog
Stack Overflow Blog
T
Threat Research - Cisco Blogs
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
Hugging Face - Blog
Hugging Face - Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Recent Announcements
Recent Announcements
P
Proofpoint News Feed
The GitHub Blog
The GitHub Blog
The Cloudflare Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
Jina AI
Jina AI
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
罗磊的独立博客
博客园 - 【当耐特】
H
Help Net Security
F
Fortinet All Blogs
T
The Blog of Author Tim Ferriss

stat updates on arXiv.org

Learning-to-Defer in Non-Stationary Time Series via Switching State-Space Models Variance Reduction for Expectations with Diffusion Teachers TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification The General Theory of Localization Methods CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels Tail Annealing for Heavy-Tailed Flow Matching Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation Latent Laplace Diffusion for Irregular Multivariate Time Series Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions Improved Baselines with Representation Autoencoders Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space Sample-efficient inductive matrix completion with noise and inexact side-information Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety Reasoning Models Don't Just Think Longer, They Move Differently TabPFN-3: Technical Report Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models Towards a holistic understanding of Selection Bias for Causal Effect Identification Adaptive Kernel Density Estimation with Pre-training Coreset-Induced Conditional Velocity Flow Matching RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage Online Learning-to-Defer with Varying Experts Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions One-Step Generative Modeling via Wasserstein Gradient Flows Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty A Composite Activation Function for Learning Stable Binary Representations Adaptive Calibration in Non-Stationary Environments Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning Ensemble Distributionally Robust Bayesian Optimisation Modulated learning for private and distributed regression with just a single sample per client device Query-efficient model evaluation using cached responses Order-Agnostic Autoregressive Modelling with Missing Data Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes Spherical Flows for Sampling Categorical Data Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics Graph Convolutional Support Vector Regression for Robust Spatiotemporal Forecasting of Urban Air Pollution Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation Understanding Self-Supervised Learning via Latent Distribution Matching Imbalanced Classification under Capacity Constraints Robust and Fast Training via Per-Sample Clipping Efficient Preference Poisoning Attack on Offline RLHF A Theory of Saddle Escape in Deep Nonlinear Networks Adaptive Querying with AI Persona Priors Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction Electricity price forecasting across Norway's five bidding zones in the post-crisis era Adversarial Robustness of NTK Neural Networks A Limit Theory of Foundation Models: A Mathematical Approach to Understanding Emergent Intelligence and Scaling Laws Conditional Score-Based Modeling of Effective Langevin Dynamics Inference of Online Newton Methods with Nesterov's Accelerated Sketching ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces Learning to Emulate Chaos: Adversarial Optimal Transport Regularization Geometric Layer-wise Approximation Rates for Deep Networks S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning Generative Augmented Inference Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression Rare Event Analysis via Stochastic Optimal Control Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates Feature Learning Dynamics in Infinite-Depth Neural Networks Statistically-Guided Meta-Learning for Cross-Deployment Activity Recognition in Distributed Fiber-Optic Sensing Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions Adversarial Robustness in One-Stage Learning-to-Defer Neural ARFIMA model for forecasting BRIC exchange rates with long memory Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling BOOST: A Data-Driven Framework for the Automated Joint Selection of Kernel and Acquisition Functions in Bayesian Optimization Random Walk Learning and the Pac-Man Attack Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models GradPower: Powering Gradients for Faster Language Model Pre-Training CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots Post-Training Augmentation Invariance Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies Program Evaluation with Remotely Sensed Outcomes Dataset-Driven Channel Masks in Transformers for Multivariate Time Series Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees
On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage
Haolin Liu, Braham Snyder, Chen-Yu Wei · 2026-02-12 · via stat updates on arXiv.org

We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^\star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative via an information-theoretic lower bound. To identify additional structure that enables sample-efficient offline RL under partial coverage, we introduce a general decision-estimation framework, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). Our framework decomposes offline RL complexity into decision complexity and value estimation error. This allows modular study of both sub-problems. Our result not only unifies existing results (Chen and Jiang, 2022; Uehara et al., 2023), but further improves and generalizes them. On the decision complexity side, our improvement includes: the first $ε^{-2}$ sample complexity bound for soft $Q$-learning under partial coverage that improves Uehara et al.'s (2023) $ε^{-4}$ bound, the removal of the need for additional online interaction in the value-gap setting of Chen and Jiang (2022), and new learnable settings beyond the above two cases. On the value estimation side, we provide a new characterization of the role of Bellman completeness under partial coverage, and the first characterization of offline learnability for general low-Bellman-rank MDPs (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021). The latter is a canonical online RL setting that has remained unexplored in offline RL except for special cases. As a side contribution, our techniques give the first analysis of CQL in the function approximation setting.