SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation
By: Gengyang Li , Wang Cai , Yifeng Gao and more
Potential Business Impact:
Makes AI think faster and smarter.
Chain-of-Thought (CoT) prompting improves reasoning but often produces long and redundant traces that substantially increase inference cost. We present SyncThink, a training-free and plug-and-play decoding method that reduces CoT overhead without modifying model weights. We find that answer tokens attend weakly to early reasoning and instead focus on the special token "/think", indicating an information bottleneck. Building on this observation, SyncThink monitors the model's own reasoning-transition signal and terminates reasoning. Experiments on GSM8K, MMLU, GPQA, and BBH across three DeepSeek-R1 distilled models show that SyncThink achieves 62.00 percent average Top-1 accuracy using 656 generated tokens and 28.68 s latency, compared to 61.22 percent, 2141 tokens, and 92.01 s for full CoT decoding. On long-horizon tasks such as GPQA, SyncThink can further yield up to +8.1 absolute accuracy by preventing over-thinking.
Similar Papers
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models
Computation and Language
Makes smart computers think less, saving energy.
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
Computation and Language
Teaches small computers to think faster.
Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning
Computation and Language
Makes AI think less to solve problems faster.