Optimistic Reinforcement Learning with Quantile Objectives
By: Mohammad Alipour-Vaezi , Huaiyang Zhong , Kwok-Leung Tsui and more
Potential Business Impact:
Teaches computers to make safer, smarter choices.
Reinforcement Learning (RL) has achieved tremendous success in recent years. However, the classical foundations of RL do not account for the risk sensitivity of the objective function, which is critical in various fields, including healthcare and finance. A popular approach to incorporate risk sensitivity is to optimize a specific quantile of the cumulative reward distribution. In this paper, we develop UCB-QRL, an optimistic learning algorithm for the $τ$-quantile objective in finite-horizon Markov decision processes (MDPs). UCB-QRL is an iterative algorithm in which, at each iteration, we first estimate the underlying transition probability and then optimize the quantile value function over a confidence ball around this estimate. We show that UCB-QRL yields a high-probability regret bound $\mathcal O\left((2/κ)^{H+1}H\sqrt{SATH\log(2SATH/δ)}\right)$ in the episodic setting with $S$ states, $A$ actions, $T$ episodes, and $H$ horizons. Here, $κ>0$ is a problem-dependent constant that captures the sensitivity of the underlying MDP's quantile value.
Similar Papers
Risk-Sensitive Q-Learning in Continuous Time with Application to Dynamic Portfolio Selection
Machine Learning (CS)
Helps computers make smarter money choices safely.
Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning
Machine Learning (CS)
Helps robots learn better when rules change.
Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression
Machine Learning (CS)
Robot learns to avoid crashing while reaching goals.