Score: 0

Enhancing Reasoning Capabilities in SLMs with Reward Guided Dataset Distillation

Published: June 25, 2025 | arXiv ID: 2507.00054v1

By: Shreyansh Padarha

Potential Business Impact:

Teaches small AI to solve hard math problems.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The push to compress and impart the proficiency of Large Language Models (LLMs) into more deployable and efficient Small Language Models (SLMs) has benefited from improvements in knowledge distillation (KD) techniques. These techniques allow a smaller student model to learn from a more capable and larger teacher model's responses. However, distillation often revolves around the student model merely copying the teacher's in-distribution responses, limiting its generalisability. This limitation is amplified on reasoning tasks and can be computationally expensive. In this study, we propose AdvDistill, a reward-guided dataset distillation framework. We utilise multiple generations (responses) from a teacher for each prompt and assign rewards based on rule-based verifiers. These varying and normally distributed rewards serve as weights when training student models. Our methods and their subsequent behavioural analysis demonstrate a significant improvement in student model performance for mathematical and complex reasoning tasks, showcasing the efficacy and benefits of incorporating a rewarding mechanism in dataset distillation processes.

Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

Computation and Language

Teaches smaller AI to be smarter than big AI.

26 Feb 2025 1

91%

SDRT: Enhance Vision-Language Models by Self-Distillation with Diverse Reasoning Traces

CV and Pattern Recognition

Teaches computers to "think" better with pictures.

3 Mar 2025 1

91%

Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning

Machine Learning (CS)

Teaches computers to think better with less data.

13 Aug 2025 1

View PDF Login to Bookmark

Page Count

17 pages

Enhancing Reasoning Capabilities in SLMs with Reward Guided Dataset Distillation

Teaches small AI to solve hard math problems.

Technical Abstract

Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

SDRT: Enhance Vision-Language Models by Self-Distillation with Diverse Reasoning Traces

Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning