Wait, Wait, Wait... Why Do Reasoning Models Loop?
By: Charilaos Pipis , Shivam Garg , Vasilis Kontonis and more
Reasoning models (e.g., DeepSeek-R1) generate long chains of thought to solve harder problems, but they often loop, repeating the same text at low temperatures or with greedy decoding. We study why this happens and what role temperature plays. With open reasoning models, we find that looping is common at low temperature. Larger models tend to loop less, and distilled students loop significantly even when their teachers rarely do. This points to mismatches between the training distribution and the learned model, which we refer to as errors in learning, as a key cause. To understand how such errors cause loops, we introduce a synthetic graph reasoning task and demonstrate two mechanisms. First, risk aversion caused by hardness of learning: when the correct progress-making action is hard to learn but an easy cyclic action is available, the model puts relatively more probability on the cyclic action and gets stuck. Second, even when there is no hardness, Transformers show an inductive bias toward temporally correlated errors, so the same few actions keep being chosen and loops appear. Higher temperature reduces looping by promoting exploration, but it does not fix the errors in learning, so generations remain much longer than necessary at high temperature; in this sense, temperature is a stopgap rather than a holistic solution. We end with a discussion of training-time interventions aimed at directly reducing errors in learning.
Similar Papers
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Machine Learning (CS)
Makes smart computer programs think faster, shorter.
Reasoning Models Reason Well, Until They Don't
Artificial Intelligence
Makes smart computers better at solving hard problems.
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
Artificial Intelligence
Makes AI think faster without losing accuracy.