Q-Learning with Fine-Grained Gap-Dependent Regret - 惯性聚合

推荐订阅源

Tailwind CSS Blog

Heimdal Security Blog

The Register - Security

奇客Solidot–传递最新科技情报

博客园 - 聂微东

Apple Machine Learning Research

Engineering at Meta

Hugging Face - Blog

大猫的无限游戏

Recent Announcements

博客园 - Franky

Google Developers Blog

OSCHINA 社区最新新闻

Google DeepMind News

让小产品的独立变现更简单 - ezindie.com

美团技术团队

酷壳 – CoolShell

博客园 - 司徒正美

博客园 - 【当耐特】

Hacker News: Ask HN

有赞技术团队

Hacker News: Front Page

Application and Cybersecurity Blog

Security Affairs

Last Week in AI

Lohrmann on Cybersecurity

博客园_首页

Troy Hunt's Blog

News and Events Feed by Topic

www.infosecurity-magazine.com

Cyber Attacks, Cyber Crime and Cyber Security

Java Code Geeks

Visual Studio Blog

罗磊的独立博客

SegmentFault 最新的问题

Help Net Security

Security Archives - TechRepublic

Attack and Defense Labs

Privacy & Cybersecurity Law Blog

stat updates on arXiv.org

Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift Any-Dimensional Invariant Universality Instance-Optimal Estimation with Multiple LLM Judges on a Budget Optimal Dimension-Free Sampling for Regularized Classification Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos Symbolic Density Estimation for Discrete Distributions Partial Fusion of Neural Networks: Efficient Tradeoffs Between Ensembles and Weight Aggregation The ASE-LSE Disagreement Landscape: An End-to-End Characterisation of Extremes and Structural Drivers When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis Anytime Training with Schedule-Free Spectral Optimization HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models Proxy-Based Approximation of Shapley and Banzhaf Interactions Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support The General Theory of Localization Methods Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots Learning-to-Defer in Non-Stationary Time Series via Switching State-Space Models Adversarial Robustness in One-Stage Learning-to-Defer Learning-to-Defer with Expert-Conditional Advice Variance Reduction for Expectations with Diffusion Teachers Scalable Reinforcement Learning via Adaptive Batch Scaling TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity Latent Laplace Diffusion for Irregular Multivariate Time Series Multi-task Linear Regression without Eigenvalue Lower Bounds: Adaptivity, Robustness, and Safety Sample efficient inductive matrix completion with noise and inexact side information Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers TabPFN-3: Technical Report BOOST: A Data-Driven Framework for the Automated Joint Selection of Kernel and Acquisition Functions in Bayesian Optimization Reasoning Models Don't Just Think Longer, They Move Differently Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks Coreset-Induced Conditional Velocity Flow Matching Adaptive Kernel Density Estimation with Pre-training Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models Towards a holistic understanding of Selection Bias for Causal Effect Identification Adaptive Calibration in Non-Stationary Environments A Composite Activation Function for Learning Stable Binary Representations Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions Online Learning-to-Defer with Varying Experts Neural ARFIMA model for forecasting BRIC exchange rates with long memory RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification One-Step Generative Modeling via Wasserstein Gradient Flows Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage Modulated learning for private and distributed regression with just a single sample per client device A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors Order-Agnostic Autoregressive Modelling with Missing Data Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling Query-efficient model evaluation using cached responses Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift Spherical Flows for Sampling Categorical Data Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics Understanding Self-Supervised Learning via Latent Distribution Matching Imbalanced Classification under Capacity Constraints Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation Efficient Preference Poisoning Attack on Offline RLHF Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction Adaptive Querying with AI Persona Priors Electricity price forecasting across Norway's five bidding zones in the post-crisis era Adversarial Robustness of NTK Neural Networks Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces Inference of Online Newton Methods with Nesterov's Accelerated Sketching Conditional Score-Based Modeling of Effective Langevin Dynamics ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation Post-Training Augmentation Invariance S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version Rare Event Analysis via Stochastic Optimal Control Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression Generative Augmented Inference Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer Feature Learning Dynamics in Infinite-Depth Neural Networks Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions Program Evaluation with Remotely Sensed Outcomes A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing Dataset-Driven Channel Masks in Transformers for Multivariate Time Series Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees

Q-Learning with Fine-Grained Gap-Dependent Regret

[Submitted on 8 Oct 2025 (v1), last revised 15 Jun 2026 (this ve · 2026-06-16 · via stat updates on arXiv.org

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。