Score: 0

Outcome-based Reinforcement Learning to Predict the Future

Published: May 23, 2025 | arXiv ID: 2505.17989v3

By: Benjamin Turtel , Danny Franklin , Kris Skotheim and more

Potential Business Impact:

Helps computers predict future events accurately.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world events - a challenging task for RL due to the very noisy (and delayed) outcomes involved. Using a novel dataset of recent questions from a prediction market, and accompanying relevant news headlines, we show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1, while greatly improving probabilistic calibration. The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10% across all questions in the test set. We detail and compare approaches used in training our model, including augmenting our training-data with synthetic prediction questions, guardrails for learning stability, and median prediction sampling at inference-time.

The Reasoning Boundary Paradox: How Reinforcement Learning Constrains Language Models

Artificial Intelligence

Fixes AI reasoning errors by focusing on hard problems.

2 Oct 2025 1

91%

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Artificial Intelligence

Makes computers learn new tricks, but not really.

18 Apr 2025 1

90%

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Computation and Language

Teaches computers to understand and show feelings.

3 Jul 2025 2

View PDF Login to Bookmark

Page Count

12 pages

Outcome-based Reinforcement Learning to Predict the Future

Helps computers predict future events accurately.

Technical Abstract

The Reasoning Boundary Paradox: How Reinforcement Learning Constrains Language Models

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents