Rethinking Reflection in Pre-Training
By: Essential AI , : , Darsh J Shah and more
Potential Business Impact:
Computers learn to fix their own mistakes.
A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.
Similar Papers
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
Machine Learning (CS)
Makes AI think again to solve problems better.
Illusions of reflection: open-ended task reveals systematic failures in Large Language Models' reflective reasoning
Artificial Intelligence
Computers don't learn from their own mistakes.
ReflCtrl: Controlling LLM Reflection via Representation Engineering
Artificial Intelligence
Control AI's thinking to save energy.