Score: 1

Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention

Published: August 3, 2025 | arXiv ID: 2508.01604v1

By: Xinhan Di, JoyJiaoW

Potential Business Impact:

Teaches computers to solve hard math problems.

Reinforcement learning scaling enhances the reasoning capabilities of large language models, with reinforcement learning serving as the key technique to draw out complex reasoning. However, key technical details of state-of-the-art reasoning LLMs, such as those in the OpenAI O series, Claude 3 series, DeepMind's Gemini 2.5 series, and Grok 3 series, remain undisclosed, making it difficult for the research community to replicate their reinforcement learning training results. Therefore, we start our study from an Early Preview Reinforcement Learning (EPRLI) algorithm built on the open-source GRPO framework, incorporating difficulty-aware intervention for math problems. Applied to a 1.5B-parameter LLM, our method achieves 50.0% on AIME24, 89.2% on Math500, 77.1% on AMC, 35.3% on Minerva, and 51.9% on OBench, superpass O1-Preview and is comparable to O1-mini within standard school-lab settings.

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Machine Learning (CS)

Makes small AI smarter with less money.

20 Mar 2025 2

91%

How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study

Computation and Language

Teaches AI to solve harder math and code problems.

1 Apr 2025 1

90%

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Artificial Intelligence

Teaches computers to think better and use knowledge.

5 Jun 2025 0

View PDF Login to Bookmark

Page Count

7 pages

Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention

Teaches computers to solve hard math problems.

Technical Abstract

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning