Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
By: Weixiang Zhao , Xingyu Sui , Jiahe Guo and more
Potential Business Impact:
Makes smart computers think less, faster, and safer.
Recent advancements in Large Reasoning Models (LRMs), such as OpenAI's o1/o3 and DeepSeek-R1, have demonstrated remarkable performance in specialized reasoning tasks through human-like deliberative thinking and long chain-of-thought reasoning. However, our systematic evaluation across various model families (DeepSeek, Qwen, and LLaMA) and scales (7B to 671B) reveals that acquiring these deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs, including notable declines in helpfulness and harmlessness, alongside substantially increased inference costs. Importantly, we demonstrate that adaptive reasoning -- employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking -- can effectively alleviate these drawbacks. Our empirical insights underline the critical need for developing more versatile LRMs capable of dynamically allocating inference-time compute according to specific task characteristics.
Similar Papers
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
Artificial Intelligence
Makes AI think faster without losing accuracy.
Exploring the Necessity of Reasoning in LLM-based Agent Scenarios
Artificial Intelligence
New AI thinks better, but sometimes too much.
From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models
Artificial Intelligence
Computers change how they think based on how hard a problem is.