惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Cisco Talos Blog
Cisco Talos Blog
T
Threat Research - Cisco Blogs
P
Privacy International News Feed
S
Schneier on Security
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
云风的 BLOG
云风的 BLOG
P
Proofpoint News Feed
Scott Helme
Scott Helme
人人都是产品经理
人人都是产品经理
G
GRAHAM CLULEY
O
OpenAI News
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
PCI Perspectives
PCI Perspectives
GbyAI
GbyAI
宝玉的分享
宝玉的分享
Y
Y Combinator Blog
T
Troy Hunt's Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
C
CXSECURITY Database RSS Feed - CXSecurity.com
腾讯CDC
C
Check Point Blog
Spread Privacy
Spread Privacy
L
LINUX DO - 最新话题
Recent Announcements
Recent Announcements
大猫的无限游戏
大猫的无限游戏
P
Palo Alto Networks Blog
Hacker News: Ask HN
Hacker News: Ask HN
M
MIT News - Artificial intelligence
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
The Hacker News
The Hacker News
H
Hacker News: Front Page
Microsoft Azure Blog
Microsoft Azure Blog
I
InfoQ
T
Tor Project blog
Martin Fowler
Martin Fowler
博客园 - 叶小钗
罗磊的独立博客
C
Cyber Attacks, Cyber Crime and Cyber Security
H
Heimdal Security Blog
V
Vulnerabilities – Threatpost
Simon Willison's Weblog
Simon Willison's Weblog
Latest news
Latest news
WordPress大学
WordPress大学
G
Google Developers Blog
N
Netflix TechBlog - Medium
S
Security Affairs
S
Secure Thoughts
Know Your Adversary
Know Your Adversary

Paper Index on ACL Anthology

A Bounded Coordination-Support Capability for Multi-Party Settings: Task-State Monitoring in Firefighter Incident Command A Dataset of Latin Etymologies Extracted from Wiktionary An Efficient Approach for Answering Not Readily Attainable Questions for RAG-based Applications Automated German Alt Text Generation for News Charts Call Support Copilot: A Reproducible Multimodal System for Speech Emotion Recognition, Intent Understanding, and Agent Assistance Can Large Language Models Replace Statistical Software? Code-Switching Detection in Multilingual Child Speech with SwissBERT Concept Extraction and Webb’s Depth of Knowledge: Comparing LLM Question Generation Pipelines for Educational Assessment Data Augmentation for Historical NER: A Systematic Comparison of Lexical and LLM-based Approaches Enhancing Retrieval via Cognitively Motivated Document Expansion Extending the Contact Hypothesis: Cross-Linguistic Evaluation of Religion and Nationality Bias When Prompting LLMs in German and Icelandic Extracting Article-Level Legal Dependencies from Swiss Federal Law using LLMs How Good is AI on Swiss Voting Booklets? A Multilingual OCR and Alignment Benchmark Optimizing Large Language Models for Robust Domain-Specific Text-to-SQL: From Prompting to Preference Alignment Proceedings of the 11th Edition of the Swiss Text Analytics Conference Reinforcement Learning for Latent-Space Thinking in LLMs RUMLEM: A Dictionary-Based Lemmatizer for Romansh Skill Extraction from Resumes and Job Offers across Six Languages Text vs. Phoneme Intermediates for Low-Resource Swiss German The Same Email, Signed Differently: Testing Negotiation Bias and Recommendation Stability in LLMs Which Skills Debate Reaches the Public? Comparing Scientific Literature and Media Coverage of AI and LLM Skill Impacts (2022–2025) Controlling Language and Style of Multi-lingual Generative Language Models with Control Vectors Hybrid Human-LLM Corpus Construction and LLM Evaluation for the Caused-Motion Construction Implicit and Indirect: Detecting Face-threatening and Paired Actions in Asynchronous Online Conversations Northern European Journal of Language Technology, Volume 11 A modular architecture for creating multimodal embodied agents with an episodic Knowledge Graph as an explainable and controllable long-term memory A Neural Approach to Discourse Relation Signal Detection An Analysis of Japanese Sentence-final Particle Yone: Compare Yone and Ne in Response Attribution and the discourse structure of reports Automatic Detection of the Bulgarian Evidential Renarrative Automatic Essay Scoring Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses Beyond semantics: the challenges of annotating pragmatic and discourse phenomena Bullshit, Pragmatic Deception, and Natural Language Processing Calling things by their names: Towards a unified account for name-informing and mixed quotation Characterizing the Response Space of Questions: data and theory Cognitive and social delays in the initiation of conversational repair Common Ground inconsistencies in dialogue systems: conflict patterns implied by polar question forms Computational Linguistics in Bulgaria Demonstrative Pronouns as Anti-Logophoric Pronouns: An Experimental Investigation Digging Communicative Intentions: The Case of Crises Events Discourse Relations and Connectives in Higher Text Structure Does ChatGPT Adapt Itself to the Language Used and the Audience It Implies? Embodied Conversational Systems in Human–Robot Interaction: Introduction to the Special Issue Enhancing Long-term RAG Chatbots with Psychological Models of Memory Importance and Forgetting Event and Entity Coreference Across Five Languages: Effects of Context and Referring Expression Exploring the Sensitivity to Alternative Signals of Coherence Relations Few Shades of Supervision for Discourse Segmentation Form and Function of Connectives in Chinese Conversational Speech From Discursive Practice to Logic? Remarks on Logical Expressivism GailBot: An automatic transcription system for Conversation Analysis German Demonstrative Pronouns in Contrast German Modal Particles as Discourse Signals Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human–Robot Interaction How People Structure Representations of Discourse Investigating Proactivity in Task-Oriented Dialogues It matters how you combine your clauses: Effects of syntactic subordination, connectives, and typographic and prosodic boundaries on the prominence of referents Journal Computational Linguistics in Bulgaria Laughter use by virtual agents increases task success Lexical Alignment to Non-native Speakers Lexical and contextual cue effects in discourse expectations: Experimenting with German ’zwar...aber’ and English ’true/sure...but’ Light Verb Constructions in ELEXIS-WSD – Annotation, Comparisons and Issues Modelling Structures for Situated Discourse Multi-modal Anaphora and Broadcasting of Information by Gestural Post-holds Narrative Elements in Expository Texts Opinion Piece: Can we Fix the Scope for Coreference? Perspective-Taking and Protagonist Prominence Please, Please, Just Tell Me: The Linguistic Features of Humorous Deception Pragmatic uses of I don’t know, boosters, and hedges in text and talk Prior Lessons of Incremental Dialogue and Robot Action Management for the Age of Language Models Processing of discourse anaphors by L2 speakers of English Referential Communication Between Friends and Strangers in the Wild Repair of claimed non-understanding of word meaning in online discussion forum interaction Scoring Coreference Chains with Split-Antecedent Anaphors Self-Repair in Tigrinya: Trouble Sources, Mechanisms and Solutions Signaling of Causal Relations in Spanish: Variety, Functionality, and Specificity Strategic Dialogue Assessment: The Crooked Path to Innocence Studying Alignment in a Collaborative Learning Activity via Automatic Methods: The Link Between What We Say and Do The (Possible) Use of AI Tools for Processing Texts in Journalism in Bulgarian The Conversational Discourse Unit: Identification and Its Role in Conversational Turn-taking Management The effect of domain knowledge and implicitation on discourse relation inferences The timing of prominence information during the resolution of German personal and demonstrative pronouns The Use of Perspective Markers and Connectives in Expressing Subjectivity: Evidence from Collocational Analyses User Impressions of System Questions to Acquire Lexical Knowledge during Dialogues User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning When to Say What and How: Adapting the Elaborateness and Indirectness of Spoken Dialogue Systems Why ellipsis? Interactional function predicts choice of syntactic form in conversation A Comparison of Methods to Bias Translation Toward Portuguese Variants A Dataset of Brazilian Portuguese Clinical Notes for Anaphylaxis Detection A elaboração de uma edição digital d’Os Lusíadas A Larger Annotated Corpus of Portuguese Coreference A Lexicon-Grammar of Brazilian Portuguese Predicative Adjectives A Multilingual Voice Analytics Module for Contact-Center Hiring A Multimodal Framework for Financial Fake News Detection for Brazilian Portuguese A Multitask Transformer for Offensive Language Detection and Target Identification in HateBR A RAG Chatbot with Incremental Context Retrieval based on Local LLMs for Hospital Documents A UD Parser to the Rescue: A Method for Bringing a Classical Annotated Corpus to Life Again Accelerating Portuguese Masked Diffusion Models through Representation Alignment Agent Orchestration - LLM for Legal Metadata Extraction: A Comparative Analysis of Efficiency and Precision ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs AMALIA: A Fully Open Large Language Model for European Portuguese
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin
2026-06-22 · via Paper Index on ACL Anthology

Abstract

Reinforcement learning (RL) has emerged as an effective approach for enhancing the reasoning capabilities of large language models (LLMs), especially in scenarios where supervised fine-tuning (SFT) falls short due to limited chain-of-thought (CoT) data. Among RL-based post-training methods, group relative advantage estimation, as exemplified by Group Relative Policy Optimization (GRPO), has attracted considerable attention for eliminating the dependency on the value model, thereby simplifying training compared to traditional approaches like Proximal Policy Optimization (PPO). However, existing group relative advantage estimation method still suffers from training inefficiencies, particularly when the estimated advantage approaches zero. To address this limitation, we propose Advantage-Augmented Policy Optimization (AAPO), a novel RL algorithm that optimizes the cross-entropy (CE) loss using advantages enhanced through a margin-based estimation scheme. This approach effectively mitigates the inefficiencies associated with group relative advantage estimation. Experimental results on multiple mathematical reasoning benchmarks and model series demonstrate the superior performance of AAPO. Code is available at https://github.com/JianxXiong/AAPO.

Anthology ID:
2026.acl-long.1131
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24663–24680
Language:
URL:
https://aclanthology.org/2026.acl-long.1131/
DOI:
Bibkey:
Cite (ACL):
Jian Xiong, Jingbo Zhou, Jingyong Ye, Qiang Huang, and Dejing Dou. 2026. AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24663–24680, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin (Xiong et al., ACL 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.acl-long.1131.pdf
Checklist:
 2026.acl-long.1131.checklist.pdf