Score: 1

Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning

Published: October 20, 2025 | arXiv ID: 2510.17690v1

By: Xihong Su

Potential Business Impact:

Helps computers make smarter, safer choices.

Business Areas:

Risk Management Professional Services

This dissertation makes three main contributions. First, We identify a new connection between policy gradient and dynamic programming in MMDPs and propose the Coordinate Ascent Dynamic Programming (CADP) algorithm to compute a Markov policy that maximizes the discounted return averaged over the uncertain models. CADP adjusts model weights iteratively to guarantee monotone policy improvements to a local maximum. Second, We establish sufficient and necessary conditions for the exponential ERM Bellman operator to be a contraction and prove the existence of stationary deterministic optimal policies for ERM-TRC and EVaR-TRC. We also propose exponential value iteration, policy iteration, and linear programming algorithms for computing optimal stationary policies for ERM-TRC and EVaR-TRC. Third, We propose model-free Q-learning algorithms for computing policies with risk-averse objectives: ERM-TRC and EVaR-TRC. The challenge is that Q-learning ERM Bellman may not be a contraction. Instead, we use the monotonicity of Q-learning ERM Bellman operators to derive a rigorous proof that the ERM-TRC and the EVaR-TRC Q-learning algorithms converge to the optimal risk-averse value functions. The proposed Q-learning algorithms compute the optimal stationary policy for ERM-TRC and EVaR-TRC.

Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning

Risk Management

Teaches computers to make safer, smarter decisions.

31 Dec 2025 1

89%

Provably Efficient Sample Complexity for Robust CMDP

Machine Learning (CS)

Teaches robots to be safe and smart.

10 Nov 2025 1

89%

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Machine Learning (CS)

Teaches robots to make safe choices always.

25 May 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

137 pages

Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning

Helps computers make smarter, safer choices.

Technical Abstract

Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning

Provably Efficient Sample Complexity for Robust CMDP

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees