Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring
By: Yannis Belkhiter , Seshu Tirupathi , Giulio Zizzo and more
Potential Business Impact:
Makes AI think smarter and faster.
The field of Language Reasoning Models (LRMs) has been very active over the past few years with advances in training and inference techniques enabling LRMs to reason longer, and more accurately. However, a growing body of studies show that LRMs are still inefficient, over-generating verification and reflection steps. To address this challenge, we introduce the Step-Tagging framework, a lightweight sentence-classifier enabling real-time annotation of the type of reasoning steps that an LRM is generating. To monitor reasoning behaviors, we introduced ReasonType: a novel taxonomy of reasoning steps. Building on this framework, we demonstrated that online monitoring of the count of specific steps can produce effective interpretable early stopping criteria of LRM inferences. We evaluate the Step-tagging framework on three open-source reasoning models across standard benchmark datasets: MATH500, GSM8K, AIME and non-mathematical tasks (GPQA and MMLU-Pro). We achieve 20 to 50\% token reduction while maintaining comparable accuracy to standard generation, with largest gains observed on more computation-heavy tasks. This work offers a novel way to increase control over the generation of LRMs, and a new tool to study behaviors of LRMs.
Similar Papers
Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human Lens
Artificial Intelligence
Helps computers think and learn like people.
Exploring the Necessity of Reasoning in LLM-based Agent Scenarios
Artificial Intelligence
New AI thinks better, but sometimes too much.
Reasoning Models Reason Well, Until They Don't
Artificial Intelligence
Makes smart computers better at solving hard problems.