Score: 0

Geometric Re-Analysis of Classical MDP Solving Algorithms

Published: March 6, 2025 | arXiv ID: 2503.04203v1

By: Arsenii Mustafin , Aleksei Pakharev , Alex Olshevsky and more

Potential Business Impact:

Makes computer learning faster and more reliable.

Business Areas:

A/B Testing Data and Analytics

We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical apparatus, including a transformation that modifies the discount factor $\gamma$, to improve convergence guarantees for these algorithms in several settings. In particular, one of our results identifies a rotation component in the VI method, and as a consequence shows that when a Markov Reward Process (MRP) induced by the optimal policy is irreducible and aperiodic, the asymptotic convergence rate of value iteration is strictly smaller than $\gamma$.