惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.AI updates on arXiv.org

Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters Transforming Constraint Programs to Input for Local Search HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents AI Technologies in Language Access: Attitudes Towards AI and the Human Value of Language Access Managers DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German From SGD to Muon: Adaptive Optimization via Schatten-p Norms Beyond Rational Illusion: Behaviorally Realistic Strategic Classification Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference Memory-Augmented Reinforcement Learning Agent for CAD Generation Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints EmbGen: Teaching with Reassembled Corpora HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding Harnessing Self-Supervised Features for Art Classification When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation FormalASR: End-to-End Spoken Chinese to Formal Text Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection Base Models Look Human To AI Detectors BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects AgentNLQ: A General-Purpose Agent for Natural Language to SQL Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models Efficient Elicitation of Collective Disagreements LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance Targeted Downstream-Agnostic Attack PhyWorld: Physics-Faithful World Model for Video Generation Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version) EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents Multi-Scale Generative Modeling with Heat Dissipation Flow Matching Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use OpenComputer: Verifiable Software Worlds for Computer-Use Agents Beyond Mode Collapse: Distribution Matching for Diverse Reasoning Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis Not all uncertainty is alike: volatility, stochasticity, and exploration Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs Generative Auto-Bidding with Unified Modeling and Exploration Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization Streamlined Constraint Reasoning via CNN Pattern Recognition on Enumerated Solutions Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges optimize_anything: A Universal API for Optimizing any Text Parameter TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents CogScale: Scalable Benchmark for Sequence Processing Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings KappaPlace: Learning Hyperspherical Uncertainty for Visual Place Recognition via Prototype-Anchored Supervision ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings Evaluating the Utility of Personal Health Records in Personalized Health AI Projecting Latent RL Actions: Towards Generalizable and Scalable Graph Combinatorial Optimization Learning Long-Term Temporal Dependencies in Photovoltaic Power Output Prediction Through Multi-Horizon Forecasting Synthesis and Evaluation of Long-term History-aware Medical Dialogue Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution Interference-Aware Multi-Task Unlearning Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering How Far Are We From True Auto-Research? Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On Agentic Trading: When LLM Agents Meet Financial Markets LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation Generative Recursive Reasoning GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction Hallucination as Exploit: Evidence-Carrying Multimodal Agents Dimensional Balance Improves Large Scale Spatiotemporal Prediction Performance Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models
Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model
Oliver Morte · 2026-05-20 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:We study risk-sensitive reinforcement learning in finite discounted MDPs with recursive entropic risk measures (ERM), where the risk parameter $\beta \neq 0$ controls the agent's risk attitude: $\beta>0$ for risk-averse and $\beta<0$ for risk-seeking behavior. A generative model of the MDP is assumed to be available. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive ERM. We introduce a model-based algorithm, called Model-Based ERM $Q$-Value Iteration (MB-RS-QVI), and derive PAC-type bounds on its sample complexity for both value and policy learning. Both PAC bounds scale exponentially with $|\beta|/(1-\gamma)$, where $\gamma$ is the discount factor. We also establish corresponding lower bounds for both value and policy learning, showing that exponential dependence on $|\beta|/(1-\gamma)$ is unavoidable in the worst case. The bounds are tight in the number of states and actions ($S$ and $A$), providing the first rigorous sample complexity guarantees for recursive ERM across both risk-averse and risk-seeking regimes.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as: arXiv:2506.00286 [cs.LG]
  (or arXiv:2506.00286v3 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2506.00286

arXiv-issued DOI via DataCite

Submission history

From: Mohammad Sadegh Talebi [view email]
[v1] Fri, 30 May 2025 22:27:57 UTC (42 KB)
[v2] Wed, 1 Oct 2025 09:50:45 UTC (40 KB)
[v3] Mon, 18 May 2026 21:58:29 UTC (488 KB)