ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
By: Ziqing Qiao , Yongheng Deng , Jiali Zeng and more
Potential Business Impact:
Makes smart computer answers shorter, saving power.
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs, increasing computational overhead. Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection, which fails to remove redundant content thoroughly. To address these limitations, this work begins by framing two key patterns of redundant reflection in LRMs--Confidence Deficit, wherein the model reflects on correct intermediate steps, and Termination Delay, where reflection continues after a verified, confident answer--through a confidence-guided perspective. Based on this, we introduce ConCISE (Confidence-guided Compression In Step-by-step Efficient Reasoning), a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient. Extensive experiments demonstrate that compared to baseline methods, fine-tuning LRMs on ConCISE-generated data yields a better balance between compression and task performance, reducing length by up to approximately 50% under SimPO, while maintaining high task accuracy.
Similar Papers
ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning
Artificial Intelligence
Makes smart computers think less, faster.
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Computation and Language
Makes AI think faster and smarter.
Answer Convergence as a Signal for Early Stopping in Reasoning
Computation and Language
Makes smart computers think less, saving time and money.