Score: 0

Reasoning Pattern Matters: Learning to Reason without Human Rationales

Published: October 14, 2025 | arXiv ID: 2510.12643v1

By: Chaoxu Pang, Yixuan Cao, Ping Luo

Potential Business Impact:

Lets computers learn reasoning without human examples.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities under the widely adopted SFT+RLVR paradigm, which first performs Supervised Fine-Tuning (SFT) on human-annotated reasoning trajectories (rationales) to establish initial reasoning behaviors, then applies Reinforcement Learning with Verifiable Rewards (RLVR) to optimize the model using verifiable signals without golden rationales. However, annotating high-quality rationales for the SFT stage remains prohibitively expensive. This paper investigates when and how rationale annotation costs can be substantially reduced without compromising reasoning performance. We identify a broad class of problems, termed patterned reasoning tasks, where reasoning follows a fixed, procedural strategy consistent across instances. Although instances vary in content such as domain knowledge, factual information, or numeric values, the solution derives from applying a shared reasoning pattern. We argue that the success of SFT+RLVR on such tasks primarily stems from its ability to enable models to internalize these reasoning patterns. Using numerical semantic matching as a representative task, we provide both causal and behavioral evidence showing that reasoning patterns rather than the quantity or quality of rationales are the key determinant of performance. Building on these insights, we propose Pattern-Aware LLMs as Rationale AnnOtators (PARO), a simple yet effective framework that enables LLMs to generate rationales aligned with task-specific reasoning patterns without requiring human rationale annotations. Experiments show that PARO-generated rationales achieve comparable SFT+RLVR performance to human rationales that are 10 times larger. These results suggest that large-scale human rationale annotations can be replaced with LLM-based automatic annotations requiring only limited human supervision over reasoning patterns.

On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models

Machine Learning (CS)

Teaches computers to pick the best thinking steps.

5 Jun 2025 1

90%

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Computation and Language

Teaches computers to think better, not just copy.

10 Apr 2025 2

90%

RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification

CV and Pattern Recognition

Teaches computers to understand videos better.

19 Nov 2025 3

View PDF Login to Bookmark

Page Count

14 pages

Reasoning Pattern Matters: Learning to Reason without Human Rationales

Lets computers learn reasoning without human examples.

Technical Abstract

On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification