Score: 0

h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

Published: October 8, 2025 | arXiv ID: 2510.07312v1

By: Sumeet Ramesh Motwani , Alesia Ivanova , Ziyang Cai and more

Potential Business Impact:

Teaches computers to solve harder math problems.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models excel at short-horizon reasoning tasks, but performance drops as reasoning horizon lengths increase. Existing approaches to combat this rely on inference-time scaffolding or costly step-level supervision, neither of which scales easily. In this work, we introduce a scalable method to bootstrap long-horizon reasoning capabilities using only existing, abundant short-horizon data. Our approach synthetically composes simple problems into complex, multi-step dependency chains of arbitrary length. We train models on this data using outcome-only rewards under a curriculum that automatically increases in complexity, allowing RL training to be scaled much further without saturating. Empirically, our method generalizes remarkably well: curriculum training on composed 6th-grade level math problems (GSM8K) boosts accuracy on longer, competition-level benchmarks (GSM-Symbolic, MATH-500, AIME) by up to 2.06x. Importantly, our long-horizon improvements are significantly higher than baselines even at high pass@k, showing that models can learn new reasoning paths under RL. Theoretically, we show that curriculum RL with outcome rewards achieves an exponential improvement in sample complexity over full-horizon training, providing training signal comparable to dense supervision. h1 therefore introduces an efficient path towards scaling RL for long-horizon problems using only existing data.

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

Machine Learning (CS)

Teaches computers to solve hard problems step-by-step.

7 Jun 2025 0

90%

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Machine Learning (CS)

Makes small AI smarter with less money.

20 Mar 2025 2

90%

Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models

Machine Learning (CS)

Helps small AI learn to think better.

3 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Page Count

31 pages

h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

Teaches computers to solve harder math problems.

Technical Abstract

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models