惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
E
Exploit-DB.com RSS Feed
T
Tenable Blog
P
Privacy International News Feed
C
CXSECURITY Database RSS Feed - CXSecurity.com
Cisco Talos Blog
Cisco Talos Blog
T
Tor Project blog
H
Hacker News: Front Page
S
Securelist
Recent Commits to openclaw:main
Recent Commits to openclaw:main
A
Arctic Wolf
K
Kaspersky official blog
C
Cybersecurity and Infrastructure Security Agency CISA
Cloudbric
Cloudbric
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
I
Intezer
Simon Willison's Weblog
Simon Willison's Weblog
博客园 - 叶小钗
The Cloudflare Blog
L
LINUX DO - 热门话题
Last Week in AI
Last Week in AI
V
V2EX
L
LINUX DO - 最新话题
AWS News Blog
AWS News Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
The Hacker News
The Hacker News
W
WeLiveSecurity
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
PCI Perspectives
PCI Perspectives
SecWiki News
SecWiki News
Application and Cybersecurity Blog
Application and Cybersecurity Blog
WordPress大学
WordPress大学
罗磊的独立博客
博客园 - 【当耐特】
Apple Machine Learning Research
Apple Machine Learning Research
J
Java Code Geeks
D
Docker
O
OpenAI News
Martin Fowler
Martin Fowler
N
News and Events Feed by Topic
S
Security @ Cisco Blogs
G
Google Developers Blog
H
Heimdal Security Blog
T
Troy Hunt's Blog
Vercel News
Vercel News
Recent Announcements
Recent Announcements
爱范儿
爱范儿
AI
AI

cs.LG updates on arXiv.org

Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition Unified Multimodal Uncertain Inference SenBen: Sensitive Scene Graphs for Explainable Content Moderation Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection Detecting Diffusion-generated Images via Dynamic Assembly Forests FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval PhysInOne: Visual Physics Learning and Reasoning in One Suite Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification OmniPrism: Learning Disentangled Visual Concept for Image Generation CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention R3PM-Net: Real-time, Robust, Real-world Point Matching Network Needle in a Haystack: One-Class Representation Learning for Detecting Rare Malignant Cells in Computational Cytology Generative 3D Gaussian Splatting for Arbitrary-ResolutionAtmospheric Downscaling and Forecasting When & How to Write for Personalized Demand-aware Query Rewriting in Video Search Relational Visual Similarity HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models MolPaQ: Modular Quantum-Classical Patch Learning for Interpretable Molecular Generation Memory-Guided Trust-Region Bayesian Optimization (MG-TuRBO) for High Dimensions On the Spectral Geometry of Cross-Modal Representations: A Functional Map Diagnostic for Multimodal Alignment Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models EngageTriBoost: Predictive Modeling of User Engagement in Digital Mental Health Intervention Using Explainable Machine Learning AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs Reservoir observer enhanced with residual calibration and attention mechanism Joint Interference Detection and Identification via Adversarial Multi-task Learning Structured Exploration and Exploitation of Label Functions for Automated Data Annotation StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning Multivariate Time Series Anomaly Detection via Dual-Branch Reconstruction and Autoregressive Flow-based Residual Density Estimation Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits Skip-Connected Policy Optimization for Implicit Advantage From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales Efficient RL Training for LLMs with Experience Replay Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need Adversarial Sensor Errors for Safe and Robust Wind Turbine Fleet Control PRAGMA: Revolut Foundation Model IKKA: Inversion Classification via Critical Anomalies for Robust Visual Servoing Adaptive Simulation Experiment for LLM Policy Optimization $p1$: Better Prompt Optimization with Fewer Prompts EvoLen: Evolution-Guided Tokenization for DNA Language Model Smartwatch-Based Sitting Time Estimation in Real-World Office Settings Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis Loom: A Scalable Analytical Neural Computer Architecture Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis Spectral Geometry of LoRA Adapters Encodes Training Objective and Predicts Harmful Compliance Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs Finite-Sample Analysis of Nonlinear Independent Component Analysis:Sample Complexity and Identifiability Bounds Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations How does Chain of Thought decompose complex tasks? A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective Uncertainty-Aware Transformers: Conformal Prediction for Language Models Adaptive Candidate Point Thompson Sampling for High-Dimensional Bayesian Optimization Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya Delve into the Applicability of Advanced Optimizers for Multi-Task Learning Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication Predictive Entropy Links Calibration and Paraphrase Sensitivity in Medical Vision-Language Models WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics Feature-Label Modal Alignment for Robust Partial Multi-Label Learning SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning Integrated electro-optic attention nonlinearities for transformers Toward World Models for Epidemiology Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup Policy-Aware Design of Large-Scale Factorial Experiments R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Hypergraph Neural Networks Accelerate MUS Enumeration Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer Continuous Orthogonal Mode Decomposition: Haptic Signal Prediction in Tactile Internet FIT-GNN: Faster Inference Time for GNNs that 'FIT' in Memory Using Coarsening STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods: A Retrospective Cohort Study From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning An Adaptive Horizon-Aware Model Selection Framework for Demand Forecasting under Horizon-Induced Degradation SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework Reinforcement-aware Knowledge Distillation for LLM Reasoning Neural Two-Stage Stochastic Optimization for Solving Unit Commitment Problem Multi-agent Adaptive Mechanism Design Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers
Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression
Yuntian Tang · 2026-05-18 · via cs.LG updates on arXiv.org

View PDF HTML (experimental)

Abstract:Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios, resulting in significant performance degradation. To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy. To generate reliable, high-fidelity supervision, we first train a dedicated semantically-preserved compressor on mathematical CoT data with fine-grained annotations. An LLM is then fine-tuned on these compressed pairs via a mixed-ratio supervised fine-tuning (SFT), teaching it to follow a spectrum of compression budgets and providing a stable initialization for reinforcement learning (RL). We further propose Constrained and Hierarchical Ratio Policy Optimization (CHRPO) to explicitly incentivize question-solving ability under lower budgets by a hierarchical reward. Experiments on three mathematical reasoning benchmarks show the superiority of Extra-CoT. For example, on MATH-500 using Qwen3-1.7B, Extra-CoT achieves over 73\% token reduction with an accuracy improvement of 0.6\%, significantly outperforming state-of-the-art (SOTA) methods. Our source codes have been released at this https URL.
Comments: Accepted to ICML 2026. 15 pages, 7 figures
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2602.08324 [cs.LG]
  (or arXiv:2602.08324v3 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2602.08324

arXiv-issued DOI via DataCite

Submission history

From: Tang Yuntian [view email]
[v1] Mon, 9 Feb 2026 06:57:15 UTC (481 KB)
[v2] Mon, 2 Mar 2026 08:47:20 UTC (481 KB)
[v3] Fri, 15 May 2026 02:50:21 UTC (482 KB)