Score: 1

Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning

Published: August 13, 2025 | arXiv ID: 2508.09883v1

By: Xiaojun Wu , Xiaoguang Jiang , Huiyang Li and more

Potential Business Impact:

Teaches computers to think better with less data.

Large language models (LLMs) demonstrate remarkable reasoning capabilities in tasks such as algorithmic coding and mathematical problem-solving. Recent methods have improved reasoning through expanded corpus and multistage training combining reinforcement learning and supervised fine-tuning. Although some methods suggest that small but targeted dataset can incentivize reasoning via only distillation, a reasoning scaling laws is still taking shape, increasing computational costs. To address this, we propose a data-efficient distillation framework (DED) that optimizes the Pareto frontier of reasoning distillation. Inspired by the on-policy learning and diverse roll-out strategies of reinforcement learning, the key idea of our approach is threefold: (1) We identify that benchmark scores alone do not determine an effective teacher model. Through comprehensive comparisons of leading reasoning LLMs, we develop a method to select an optimal teacher model. (2) While scaling distillation can enhance reasoning, it often degrades out-of-domain performance. A carefully curated, smaller corpus achieves a balanced trade-off between in-domain and out-of-domain capabilities. (3) Diverse reasoning trajectories encourage the student model to develop robust reasoning skills. We validate our method through evaluations on mathematical reasoning (AIME 2024/2025, MATH-500) and code generation (LiveCodeBench), achieving state-of-the-art results with only 0.8k carefully curated examples, bypassing the need for extensive scaling. Our systematic analysis demonstrates that DED outperforms existing methods by considering factors beyond superficial hardness, token length, or teacher model capability. This work offers a practical and efficient pathway to advanced reasoning while preserving general capabilities.

Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation

Computation and Language

Teaches computers to solve math problems faster.

15 Jan 2026 0

91%

Enhancing Reasoning Capabilities in SLMs with Reward Guided Dataset Distillation

Artificial Intelligence

Teaches small AI to solve hard math problems.

25 Jun 2025 0

91%

From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance

Information Retrieval

Makes online shopping search faster and smarter.

13 Oct 2025 1

View PDF Login to Bookmark

Page Count

9 pages

Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning

Teaches computers to think better with less data.

Technical Abstract

Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation

Enhancing Reasoning Capabilities in SLMs with Reward Guided Dataset Distillation

From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance