Geometric Re-Analysis of Classical MDP Solving Algorithms
By: Arsenii Mustafin , Aleksei Pakharev , Alex Olshevsky and more
Potential Business Impact:
Makes computer learning faster and more reliable.
We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical apparatus, including a transformation that modifies the discount factor $\gamma$, to improve convergence guarantees for these algorithms in several settings. In particular, one of our results identifies a rotation component in the VI method, and as a consequence shows that when a Markov Reward Process (MRP) induced by the optimal policy is irreducible and aperiodic, the asymptotic convergence rate of value iteration is strictly smaller than $\gamma$.
Similar Papers
Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs
Machine Learning (CS)
Unifies math for better computer learning.
Rank-One Modified Value Iteration
Optimization and Control
Makes computers learn and plan much faster.
Value Iteration with Guessing for Markov Chains and Markov Decision Processes
Artificial Intelligence
Makes smart machines learn faster and better.