Score: 0

How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns

Published: December 30, 2025 | arXiv ID: 2512.24063v1

By: Haoyue Bai , Yiyou Sun , Wenjie Hu and more

Large Language Models (LLMs) display strikingly different generalization behaviors: supervised fine-tuning (SFT) often narrows capability, whereas reinforcement-learning (RL) tuning tends to preserve it. The reasons behind this divergence remain unclear, as prior studies have largely relied on coarse accuracy metrics. We address this gap by introducing a novel benchmark that decomposes reasoning into atomic core skills such as calculation, fact retrieval, simulation, enumeration, and diagnostic, providing a concrete framework for addressing the fundamental question of what constitutes reasoning in LLMs. By isolating and measuring these core skills, the benchmark offers a more granular view of how specific cognitive abilities emerge, transfer, and sometimes collapse during post-training. Combined with analyses of low-level statistical patterns such as distributional divergence and parameter statistics, it enables a fine-grained study of how generalization evolves under SFT and RL across mathematical, scientific reasoning, and non-reasoning tasks. Our meta-probing framework tracks model behavior at different training stages and reveals that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns. This work provides new insights into the nature of reasoning in LLMs and points toward principles for designing training strategies that foster broad, robust generalization.

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Artificial Intelligence

Math AI skills don't always help other AI tasks.

1 Jul 2025 2

93%

Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning

Machine Learning (CS)

Makes AI better at thinking, even small ones.

14 Dec 2025 0

92%

Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study

Computation and Language

Teaches computers to think step-by-step.

5 Jun 2025 1

View PDF Login to Bookmark

How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns

Technical Abstract

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning

Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study