Score: 0

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Published: October 29, 2025 | arXiv ID: 2510.26026v1

By: Feichen Gan , Youcun Lu , Yingying Zhang and more

Potential Business Impact:

Makes AI safer by showing when it's unsure.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals {for returns} in both on-policy and off-policy settings. Our method integrates distributional RL with conformal calibration, addressing challenges such as unobserved returns, temporal dependencies, and distributional shifts. We propose a modular pseudo-return construction based on truncated rollouts and a time-aware calibration strategy using experience replay and weighted subsampling. These innovations mitigate model bias and restore approximate exchangeability, enabling uncertainty quantification even under policy shifts. Our theoretical analysis provides coverage guarantees that account for model misspecification and importance weight estimation. Empirical results, including experiments in synthetic and benchmark environments like Mountain Car, show that our method significantly improves coverage and reliability over standard distributional RL baselines.

Country of Origin
🇨🇳 China

Page Count
29 pages

Category
Statistics:
Machine Learning (Stat)