






















Abstract:Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios, resulting in significant performance degradation. To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy. To generate reliable, high-fidelity supervision, we first train a dedicated semantically-preserved compressor on mathematical CoT data with fine-grained annotations. An LLM is then fine-tuned on these compressed pairs via a mixed-ratio supervised fine-tuning (SFT), teaching it to follow a spectrum of compression budgets and providing a stable initialization for reinforcement learning (RL). We further propose Constrained and Hierarchical Ratio Policy Optimization (CHRPO) to explicitly incentivize question-solving ability under lower budgets by a hierarchical reward. Experiments on three mathematical reasoning benchmarks show the superiority of Extra-CoT. For example, on MATH-500 using Qwen3-1.7B, Extra-CoT achieves over 73\% token reduction with an accuracy improvement of 0.6\%, significantly outperforming state-of-the-art (SOTA) methods. Our source codes have been released at this https URL.
| Comments: | Accepted to ICML 2026. 15 pages, 7 figures |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2602.08324 [cs.LG] |
| (or arXiv:2602.08324v3 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2602.08324 arXiv-issued DOI via DataCite |
From: Tang Yuntian [view email]
[v1]
Mon, 9 Feb 2026 06:57:15 UTC (481 KB)
[v2]
Mon, 2 Mar 2026 08:47:20 UTC (481 KB)
[v3]
Fri, 15 May 2026 02:50:21 UTC (482 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。