Abstract
Recent advancements in Generative Reward Models (GRMs) have demonstrated that scaling the length of Chain-of-Thought (CoT) reasoning considerably enhances the reliability of evaluation. However, current works predominantly rely on unstructured length scaling, ignoring the divergent efficacy of different reasoning mechanisms: Breadth-CoT (multi-dimensional principle coverage) and Depth-CoT (substantive judgment soundness). To address this, we introduce Mix-GRM, a framework that reconfigures raw rationales into structured Breadth-CoT and Depth-CoT through a modular synthesis pipeline, subsequently employing Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR) to internalize and optimize these mechanisms. Comprehensive experiments demonstrate that Mix-GRM establishes a new state-of-the-art across five benchmarks, surpassing leading open-source RMs by an average of 8.2%. Our results reveal a clear divergence in reasoning: Breadth-CoT benefits subjective preference tasks, whereas Depth-CoT excels in objective correctness tasks. Consequently, misaligning the reasoning mechanism with the task directly degrades performance. Furthermore, we demonstrate that RLVR acts as a switching amplifier, inducing an emergent polarization where the model spontaneously allocates its reasoning style to match task demands.
- Anthology ID:
- 2026.findings-acl.709
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 14449–14469
- Language:
- URL:
- https://aclanthology.org/2026.findings-acl.709/
- DOI:
- Bibkey:
- Cite (ACL):
- Qiyuan Zhang, Yufei Wang, Tianhe Wu, Can Xu, Qingfeng Sun, Kai Zheng, Xue Liu, and Chen Ma. 2026. Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 14449–14469, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models (Zhang et al., Findings 2026)
- Copy Citation:
- PDF:
- https://aclanthology.org/2026.findings-acl.709.pdf
- Checklist:
- 2026.findings-acl.709.checklist.pdf



























