Future-as-Label: Scalable Supervision from Real-World Outcomes
By: Benjamin Turtel , Paul Wilczewski , Danny Franklin and more
Many real-world prediction problems lack labels observable at prediction time, creating a temporal gap between prediction and outcome that yields supervision only after events resolve. To address this setting, we extend reinforcement learning with verifiable rewards to temporally resolved real-world prediction, and use it to train language models to make probabilistic forecasts under causally masked information with retrospective evaluation using proper scoring rules. Supervision is derived solely from post-resolution outcomes, preserving delayed-reward semantics. On real-world forecasting benchmarks, Qwen3-32B trained using Foresight Learning improves Brier score by 27% and halves calibration error relative to its pretrained baseline, and outperforms Qwen3-235B on both constructed future-event prediction tasks and the Metaculus benchmark despite a 7x parameter disadvantage.
Similar Papers
Scaling Open-Ended Reasoning to Predict the Future
Machine Learning (CS)
Helps computers predict future events accurately.
Scaling Open-Ended Reasoning to Predict the Future
Machine Learning (CS)
Helps computers predict future events accurately.
Outcome-based Reinforcement Learning to Predict the Future
Machine Learning (CS)
Helps computers predict future events accurately.