惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.AI updates on arXiv.org

Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective MedVol-R1: Reward-Driven Evidence Grounding for Volumetric Reasoning Segmentation CmIVTP: Cross-modal Interaction-based Vessel Trajectory Prediction for Maritime Intelligence Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Diffuse to Detect: Generative Diffusion Models for Unsupervised IC Anomaly Detection When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training Trust Region Q Adjoint Matching Personalized Generative Models for Contextual Debiasing E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling Adversarial Training for Robust Coverage Network under Worst-case Facility Losses ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training Measuring Prediction Uncertainty in Neural Cellular Automata Co-folding model guided by structural proteomics Innovation: An Almost Characterization of Hallucination Self-Cascaded Diffusion Models for Arbitrary-Scale Image Super-Resolution Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control Recursive Flow Matching Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V The Rescue Effect: Spatio-Semantic Early Exit Bypasses Quantization Collapse in CLIP Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts Periodic Topological Deep Learning for Polymer Design and Discovery CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies EEG-FM-Audit: A Systematic Evaluation and Analysis Pipeline for EEG Foundation Models Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection Semigroup Consistency as a Diagnostic for Learned Physics Simulators When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training DEI: Diversity in Evolutionary Inference for Quality-Diversity Search Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models GEM: Geometric Entropy Mixing for Optimal LLM Data Curation When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection Linear and Neural Dueling Bandits with Delayed Feedback TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models Generative Animations: A Multi-Model Pipeline for Prompt-Driven Motion Synthesis Curriculum Learning for Safety Alignment Unified Panoramic Geometry Estimation via Multi-View Foundation Models On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach Model Merging on Loss Landscape: A Geometry Perspective InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition On the Error-Correcting Effects of Stochasticity in Discrete Diffusion Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling Ratio-Variance Regularized Policy Optimization AssetGen: Deployable 3D Asset Generation at Interactive Speed AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Less is More: Early Stopping Rollout for On-Policy Distillation The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery Cross-scale Aligned Supervision for Training GANs Hands-On: Segmenting Individual Signs from Continuous Sequences EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models Deep-layer limit and stability analysis of the basic forward-backward-splitting induced network (II): learning problems Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from Low-Dose Computed Tomography ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection MobileMoE: Scaling On-Device Mixture of Experts Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference Unified Neural Scaling Laws BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning Comparative Study of Vision-Based Metric Measurement for Large-Scale Planar Scenes Bilevel Optimization over Saddle Points of Zero-Sum Markov Games Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection Inference-Time Search Using Side Information for Diffusion-Based Image Reconstruction Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization Scalable GANs with Transformers Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation VesselSim: learning 3D blood vessel segmentation without expert annotations VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
2026-04-14 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Reinforcement learning with verifiable rewards (RLVR) is a practical, scalable way to improve large language models on math, code, and other structured tasks. However, we argue that many headline RLVR gains are not yet well validated because reports often conflate policy improvement with three confounds: (i) budget mismatch between RLVR and baseline evaluations, (ii) attempt inflation and calibration drift that convert abstentions into confident answers, and (iii) benchmark data contamination. Using budget-matched reproductions and partial-prompt contamination probes, we find that several widely cited gaps shrink substantially or disappear once budgets, prompts, and dataset versions are matched and contaminated sets are treated as memorization probes rather than evidence of reasoning. This does not mean that RLVR is ineffective, but it implies that current measurements often overstate capability gains and obscure reliability costs. We therefore propose a compact, tax-aware minimum standard for RLVR training and evaluation: budget-matched saturation curves with variance, calibration, and abstention tracking, a judge-robustness stress test when LLM judges are used, and an explicit contamination screen. With these controls, RLVR remains effective and deployable in verifiable domains, but reasoning gains should be treated as provisional without them.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2509.21882 [cs.LG]
  (or arXiv:2509.21882v3 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2509.21882

arXiv-issued DOI via DataCite

Submission history

From: Fang Wu [view email]
[v1] Fri, 26 Sep 2025 05:06:25 UTC (1,756 KB)
[v2] Sat, 11 Apr 2026 00:48:10 UTC (1,744 KB)
[v3] Mon, 25 May 2026 20:11:55 UTC (1,734 KB)