Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models
By: Yang Xiang , Yixin Ji , Juntao Li and more
Potential Business Impact:
Makes smart computer thinking faster and cheaper.
Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex reasoning benchmarks. However, their long chain-of-thought reasoning processes incur significant inference overhead. Pruning has emerged as a promising approach to reducing computational costs. However, existing efforts have primarily focused on large language models (LLMs), while pruning LRMs remains unexplored. In this work, we conduct the first empirical study on pruning LRMs and show that directly applying existing pruning techniques fails to yield satisfactory results. Our findings indicate that using self-generated reasoning data for calibration can substantially improve pruning performance. We further investigate how the difficulty and length of reasoning data affect pruning outcomes. Our analysis reveals that challenging and moderately long self-generated reasoning data serve as ideal calibration data. Based on these insights, we propose a Selective Self-Generated Reasoning (SSGR) data construction strategy to provide effective calibration data for pruning LRMs. Experimental results on the DeepSeek-R1-Distill model series validate that our strategy improves the reasoning ability of pruned LRMs by 10%-13% compared to general pruning methods.
Similar Papers
Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models
Computation and Language
Makes smart AI smaller without losing its thinking.
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Machine Learning (CS)
Computers learn to solve harder math problems.
From Long to Short: LLMs Excel at Trimming Own Reasoning Chains
Artificial Intelligence
Makes smart computers solve problems faster, simpler.