Distributionally Robust Markov Games with Average Reward
By: Zachary Roch, Yue Wang
Potential Business Impact:
Helps teams win games even when things change.
This paper introduces the formulation of a distributionally robust Markov game (DR-MG) with average rewards, a crucial framework for multi-agent decision-making under uncertainty over extended horizons. Unlike finite-horizon or discounted models, the average-reward criterion naturally captures long-term performance for systems designed for continuous operation, where sustained reliability is paramount. We account for uncertainty in transition kernels, with players aiming to optimize their worst-case average reward. We first establish a connection between the multi-agent and single agent settings, and derive the solvability of the robust Bellman equation under the average-reward formulation. We then rigorously prove the existence of a robust Nash Equilibrium (NE), offering essential theoretical guarantees for system stability. We further develop and analyze an algorithm named robust Nash-Iteration to compute the robust Nash Equilibria among all agents, providing practical tools for identifying optimal strategies in complex, uncertain, and long-running multi-player environments. Finally, we demonstrate the connection between the average-reward NE and the well-studied discounted NEs, showing that the former can be approximated as the discount factor approaches one. Together, these contributions provide a comprehensive theoretical and algorithmic foundation for identifying optimal strategies in complex, uncertain, and long-running multi-player environments, which allow for the future extension of robust average-reward single-agent problems to the multi-agent setting.
Similar Papers
Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain
Optimization and Control
Teaches computers to make best long-term decisions.
Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning
Machine Learning (CS)
Helps robots learn tasks better and faster.
Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs
Systems and Control
Makes robots safer by predicting problems.