A Differential Perspective on Distributional Reinforcement Learning
By: Juan Sebastian Rojas, Chi-Guhn Lee
Potential Business Impact:
Teaches robots to get the most rewards over time.
To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a potentially-discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time-step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms consistently yield competitive performance when compared to their non-distributional equivalents, while also capturing rich information about the long-run reward and return distributions.
Similar Papers
Reinforcement Learning under State and Outcome Uncertainty: A Foundational Distributional Perspective
Artificial Intelligence
Helps robots learn to make safer choices.
Towards Optimal Offline Reinforcement Learning
Optimization and Control
Teaches robots to learn from one example.
Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning
Machine Learning (CS)
Helps robots learn tasks better and faster.