Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
By: Colin Hong , Xu Guo , Anand Chaanan Singh and more
Potential Business Impact:
Makes smart computer answers faster and cheaper.
Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the order-of-magnitude computational overhead limits its broad deployment. Prior attempts to accelerate SC mainly rely on model-based confidence scores or heuristics with limited empirical support. For the first time, we theoretically and empirically analyze the inefficiencies of SC and reveal actionable opportunities for improvement. Building on these insights, we propose Slim-SC, a step-wise pruning strategy that identifies and removes redundant chains using inter-chain similarity at the thought level. Experiments on three STEM reasoning datasets and two recent LLM architectures show that Slim-SC reduces inference latency and KVC usage by up to 45% and 26%, respectively, with R1-Distill, while maintaining or improving accuracy, thus offering a simple yet efficient TTS alternative for SC.
Similar Papers
Optimal Self-Consistency for Efficient Reasoning with Large Language Models
Machine Learning (CS)
Makes AI smarter with fewer guesses.
Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection
Artificial Intelligence
Makes AI better at math by checking its work.
Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning
Computation and Language
Makes AI answers more reliable and trustworthy.