惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.CL updates on arXiv.org

Phonetic Modeling of Dialectal Variation in Vietnamese Speech The Tokenizer Tax Across 25 European Languages: Domain Invariance, Cross-Lingual Few-Shot Effects, and the Ukrainian Penalty World-State Transformations for Neuro-symbolic Interactive Storytelling Mimir: Large-scale Multilingual Concept Modeling M$^\star$: Every Task Deserves Its Own Memory Harness What Are We Actually Decoding? Source Attribution for Non-Invasive Brain-to-Language Retrieval When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation Translators as Invisible Teachers of AI: Copyright, Translation Memory, and the Political Economy of Linguistic Data Clarification Is Not Enough: Post-Clarification Answering Remains the Bottleneck in Multi-Turn QA Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Discovering Lexical Gaps Using Embeddings from Multilingual LLMs Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval Generating Legal Commentaries from Case Databases via Retrieval, Clustering, and Generation EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation NITP: Next Implicit Token Prediction for LLM Pre-training Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning End-to-End Intracortical Speech Decoding from Neural Activity Knowing but Not Showing: LLMs Recognize Ambiguity but Rarely Ask Clarifying Questions Multi-Persona Debate System for Automated Scientific Hypothesis Generation An Interactive Paradigm for Deep Research Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning Overview of the PsyDefDetect Shared Task at BioNLP 2026: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations SEAL: Synergistic Co-Evolution of Agents and Learning Environments Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks TS-Skill: A Benchmark for Evaluating Analytical Skills in Time-Series Question Answering Repeated Sequences Reveal Gaps between Large Language Models and Natural Language Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning A general tensor-structured compression scheme for efficient large language models P1SCO: Social Dimensions from a Perspectivist Lens Raon-Speech Technical Report Exploring Profiles of Cognitive Distortions Associated with Mental Health Disorders Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence Better, Faster: Harnessing Self-Improvement in Large Reasoning Models By Their Fruits You Will Know Them: Comparing Formalizations of Law by the Decisions They Encode CUNY at CLPsych 2026: A Pipeline Approach to Classification and Summarization of Mental Health Changes Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap DRInQ: Evaluating Conversational Implicature with Controlled Context Variation They Are Not the Same: Direct Causes Are Not Grounded Emotion Explanations MATO: Multi-objective Personalized Alignment with Test-time Optimization for Large Language Models Lngram: N-gram Conditional Memory in Latent Space ROC Analysis for Evaluating Translation Quality Estimation Systems ECHO: Terminal Agents Learn World Models for Free TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models StepGap: A Hybrid NLI-LLM Checker for Step-Level Evidence-Gap Detectionin Multi-Hop Question Answering READER: Reasoning-Enhanced AI-Generated Text Detection AstroMind: A High-Fidelity Benchmark for Spacecraft Behavior Reasoning Based on Large Language Models MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning HiMed: Incentivizing Hindi Reasoning in Medical LLMs CP-Agent: A Calibrated Risk-Controlled Agent for Feedback-Driven Competitive Programming Word Class Representations Spontaneously Emerge from Successor Representations Trained on Natural Language Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Measuring the Depth of LLM Unlearning via Activation Patching Towards a Universal Causal Reasoner AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer Re-defining Humor Data Objects for AI Humor Research DTO: a Differentiable Training Objective for Effective Counterfactual Story Rewriting Learning to Route Languages for Multilingual Policy Optimization SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack Large Language Model Selection with Limited Annotations From Automation to Collaboration: Human-in-the-Loop Methods for Safe and Trustworthy NLP Inference Time Optimization with Confidence Dynamics Toxicity in Twitch Chats: An LLM-Based Analysis Across Gaming Communities Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction Extracting Training Data from Diffusion Language Models via Infilling Knowledge Graph-Driven Expert-Level Reasoning for Neuroscience Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs Locality Matters for Training-Free Audio Token Compression in Audio-Language Models ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Side-by-side Comparison Amplifies Dialect Bias in Language Models How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks WhenLoss: Diagnosing Write and Retrieval Bottlenecks in Long-Context Memory Systems Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents CSP-Atlas: Concept-Specific Neural Circuits in a Sparse Python Transformer TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue Tool-Call Dependency Structure is Linearly Decodable in LLM Agent Residual Streams Decompose-and-Refine: Structured Legal Question Answering with Parametric Retrieval Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation Beyond the Target: From Imitation to Collaboration in Speculative Decoding Momentum Streams for Optimizer-Inspired Transformers STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media LLM Agent Based Renewable Energy Forecasting Using Edge and IoT Data A Review of Solar Wind Weather and Grid Aware Decision Support Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization
Harmony in Diversity: Multi-domain Contrastive Policy Optimization for Large Reasoning Models
Zongji Yu, W · 2026-05-26 · via cs.CL updates on arXiv.org

View PDF HTML (experimental)

Abstract:Post-training has significantly enhanced the reasoning capability of Large Reasoning Models (LRMs), especially with Reinforcement Learning (RL) like Group Relative Policy Optimization (GRPO). However, GRPO-style RL methods in multi-domain settings often fail to achieve consistent improvements across all domains due to inherent interference in policy optimization. Prior studies on multi-domain RL primarily focus on alleviating cross-domain interference, while often neglecting the pivotal role of knowledge sharing, which we argue is the key to transforming cross-domain interactions from harmful competition into beneficial transfer. To address this limitation, we propose Multi-domain Contrastive Policy Optimization (MCPO), which analyzes the structural relationships among rollouts and promotes cross-domain knowledge sharing and in-domain knowledge consolidation in a contrastive manner. Specifically, for a given prompt, MCPO identifies transferable reasoning trajectories from other domains as positive examples, while treating incorrect rollouts as negative ones. It then encourages consistent representations for positive pairs and pushes negative pairs apart, thereby facilitating knowledge transfer and reducing interference. Moreover, MCPO aligns intra-domain correct rollouts to build a consolidated representation space. In this way, MCPO contrastively learns a harmonious representation space that can accommodate diverse multi-domain knowledge. Empirical results show that MCPO improves the reasoning capabilities of LRMs across multiple domains and even outperforms single-domain training in some cases. Code is available at this https URL.
Comments: 25 pages, 5 figures
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2605.25443 [cs.CL]
  (or arXiv:2605.25443v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2605.25443

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Zongji Yu [view email]
[v1] Mon, 25 May 2026 05:42:57 UTC (3,636 KB)