惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.LG updates on arXiv.org

Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation Near-Optimal Regret in Adversarial Kernel Bandits Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting Curriculum Learning for Safety Alignment Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design MuCon: Clipped Muon Updates for LLM Training On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training PILOT: A Data-Free Continual Learning Approach for Real-Time Semantic Segmentation via Boundary Guidance HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models Linear and Neural Dueling Bandits with Delayed Feedback Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection Localizing Memorized Regions in Diffusion Models via Coordinate-Wise Curvature Differences TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies Recursive Flow Matching Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting MechRL: Reinforcement Learning Agents Perform Circuit Discovery for Mechanistic Interpretability Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice Towards Controllable Image Generation through Representation-Conditioned Diffusion Models When Rule Violations Are Rare: Chimera Training for Logical Anomaly Detection Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models Is an Image Also Worth 16x16=256 Superpixels? A Framework for Attentional Image Classification Neural Bayesian Sequential Routing FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization Provably Communication-Efficient and Privacy-Preserving Federated Graph Neural Networks APEX: Amplitude Anchors and Phase Priors for Target-Scarce Higher-Frequency Wave Prediction DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling Unified Neural Scaling Laws Diffuse to Detect: Generative Diffusion Models for Unsupervised IC Anomaly Detection Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior GEM: Geometric Entropy Mixing for Optimal LLM Data Curation Semigroup Consistency as a Diagnostic for Learned Physics Simulators A Fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma SIKA-GP: Accelerating Gaussian Process Inference with Sparse Inducing Kernel Approximations for Bayesian Deep Learning Adversarial Training for Robust Coverage Network under Worst-case Facility Losses Balancing Plasticity and Stability with Fast and Slow Successor Features Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control The Stability of Singular Distribution: A Spectral Perspective on the Two-Phase Dynamics of Language Model Pre-training Function-Valued Causal Influence in Nonlinear Time Series Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks Personalized Generative Models for Contextual Debiasing Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training Distribution-Aware Conformal Prediction: A Framework for generating efficient prediction intervals for time series Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets Separate Aggregation of Split Network for Personalized Federated Learning JLT: Clean-Latent Prediction in Latent Diffusion Transformers On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach Bilevel Optimization over Saddle Points of Zero-Sum Markov Games Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards Image Feature Fusion-based Federated Client Unlearning (FCU) GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training Time Series Causal Discovery via Context-Conditioned and Causality-Augmented Pretraining A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection Model Merging on Loss Landscape: A Geometry Perspective Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability On the Error-Correcting Effects of Stochasticity in Discrete Diffusion Two-Parameter Flows for Learning Population Dynamics of Physical Systems More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD Variational Inference for Evidential Deep Learning Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks Amortized Factor Inference Networks for Posterior Inference Stateful Inference for Low-Latency Multi-Agent Tool Calling When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Co-folding model guided by structural proteomics Ratio-Variance Regularized Policy Optimization AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling MTL-FNO: A Lightweight Multi-Task Fourier Neural Operator for Sparse Field Reconstruction PIDM-DP: Physics-Informed Diffusion with Dormand-Prince Integration for Chaotic System Identification and State Reconstruction across Multiple Dynamical Regimes Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift FM-fMRI: Event Conditioned Flow Matching for Rest-to-Task fMRI Time-Series Synthesis Classification and detection of multiple UAVs using rational Gaussian wavelet neural networks
Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization
Xiaoyuan Che · 2026-05-27 · via cs.LG updates on arXiv.org

View PDF HTML (experimental)

Abstract:Model-based reinforcement learning (RL) can be effectively supported at scale through the use of world models. However, in practice, scaling such approaches remains fundamentally limited. A commonly recognized challenge is model bias and error compounding, which degrade long-horizon predictions. Beyond these issues, we identify a more critical yet underexplored bottleneck: a structural misalignment between search and value learning in existing world model approaches. In particular, policy improvement often relies on value functions induced by a separate, non-search policy, resulting in training inconsistency and ultimately suboptimal learning. To address this limitation, we propose Model-Based Diffusion Policy Optimization (MBDPO) in world models, a framework that unifies search and policy optimization through diffusion policy representations, thereby unlocking the potential of world models for scalable policy learning. Instead of constructing an explicit planner over a learned world model, we reformulate policy optimization as a diffusion process over searched trajectories in latent world models. In this view, we extract an implicit energy function from the collected dataset that anchors the policy, enabling MBDPO to refine the score field for policy optimization while mitigating misalignment. We evaluate MBDPO across a wide range of settings, including multi-task offline pretraining, online learning, and offline-to-online fine-tuning. In the offline regime, we further investigate its scaling behavior by pretraining on large-scale datasets, observing consistent and monotonic performance gains with increasing model capacity.
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2605.26282 [cs.LG]
  (or arXiv:2605.26282v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2605.26282

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Xiaoyuan Cheng [view email]
[v1] Mon, 25 May 2026 19:06:51 UTC (28,210 KB)