Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs
By: Arsenii Mustafin, Xinyi Sheng, Dominik Baumann
Potential Business Impact:
Unifies math for better computer learning.
The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.
Similar Papers
Geometric Re-Analysis of Classical MDP Solving Algorithms
Machine Learning (CS)
Makes computer learning faster and more reliable.
Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs
Machine Learning (CS)
Makes AI learn better without getting stuck.
Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs
Systems and Control
Makes robots safer by predicting problems.