Score: 2

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Published: March 20, 2025 | arXiv ID: 2503.16219v1

By: Quy-Anh Dang, Chris Ngo

Potential Business Impact:

Makes small AI smarter with less money.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Enhancing the reasoning capabilities of large language models (LLMs) typically relies on massive computational resources and extensive datasets, limiting accessibility for resource-constrained settings. Our study investigates the potential of reinforcement learning (RL) to improve reasoning in small LLMs, focusing on a 1.5-billion-parameter model, DeepSeek-R1-Distill-Qwen-1.5B, under strict constraints: training on 4 NVIDIA A40 GPUs (48 GB VRAM each) within 24 hours. Adapting the Group Relative Policy Optimization (GRPO) algorithm and curating a compact, high-quality mathematical reasoning dataset, we conducted three experiments to explore model behavior and performance. Our results demonstrate rapid reasoning gains - e.g., AMC23 accuracy rising from 63% to 80% and AIME24 reaching 46.7%, surpassing o1-preview - using only 7,000 samples and a $42 training cost, compared to thousands of dollars for baseline models. However, challenges such as optimization instability and length constraints emerged with prolonged training. These findings highlight the efficacy of RL-based fine-tuning for small LLMs, offering a cost-effective alternative to large-scale approaches. We release our code and datasets as open-source resources, providing insights into trade-offs and laying a foundation for scalable, reasoning-capable LLMs in resource-limited environments. All are available at https://github.com/knoveleng/open-rs.

Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention

Machine Learning (CS)

Teaches computers to solve hard math problems.

3 Aug 2025 1

92%

Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models

Machine Learning (CS)

Helps small AI learn to think better.

3 Apr 2025 0

92%

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Artificial Intelligence

Teaches computers to think better and use knowledge.

5 Jun 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com huggingface.co

Page Count

17 pages

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Makes small AI smarter with less money.

Technical Abstract

Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention

Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning