Score: 2

ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning

Published: October 17, 2025 | arXiv ID: 2510.15211v1

By: Yongchan Kwon , Shang Zhu , Federico Bianchi and more

BigTech Affiliations: Stanford University

Potential Business Impact:

Makes AI follow instructions while thinking.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The ability of large language models (LLMs) to follow user instructions is central to their reliability, safety, and usefulness. While prior studies assess instruction adherence in the model's main responses, we argue that it is also critical for large reasoning models (LRMs) to follow user instructions throughout their reasoning process. Reasoning instruction following makes LRMs more controllable and transparent, while reducing risks of undesirable shortcuts, hallucinations, or reward hacking within reasoning traces. To evaluate this dimension, we introduce ReasonIF, a systematic benchmark for assessing reasoning instruction following. ReasonIF includes six categories of instruction prompts, spanning multilingual reasoning, formatting and length control. Across many open-source LRMs including GPT-OSS, Qwen3, and DeepSeek-R1, we find substantial failures in reasoning instruction adherence: the highest instruction following score (IFS) remains below 0.25, meaning that fewer than $25\%$ of reasoning traces comply with the given instructions. Notably, as task difficulty increases, reasoning instruction following degrades further. We also explore two strategies to enhance reasoning instruction fidelity. (1) multi-turn reasoning and (2) Reasoning Instruction Finetuning (RIF) using synthetic data. RIF improves the IFS of $GPT-OSS-20B$ from 0.11 to 0.27, indicating measurable progress but leaving ample room for improvement.

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Computation and Language

Makes AI follow instructions better by fixing reasoning.

16 May 2025 0

91%

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

CV and Pattern Recognition

Teaches computers to follow tricky directions better.

2 Jun 2025 2

91%

Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following

Computation and Language

Teaches computers to follow tricky instructions better.

5 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

15 pages

ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning

Makes AI follow instructions while thinking.

Technical Abstract

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following