Convergent Q-Learning for Infinite-Horizon General-Sum Markov Games through Behavioral Economics
By: Yizhou Zhang, Eric Mazumdar
Potential Business Impact:
Helps computers learn to play games better.
Risk-aversion and bounded rationality are two key characteristics of human decision-making. Risk-averse quantal-response equilibrium (RQE) is a solution concept that incorporates these features, providing a more realistic depiction of human decision making in various strategic environments compared to a Nash equilibrium. Furthermore a class of RQE has recently been shown in arXiv:2406.14156 to be universally computationally tractable in all finite-horizon Markov games, allowing for the development of multi-agent reinforcement learning algorithms with convergence guarantees. In this paper, we expand upon the study of RQE and analyze their computation in both two-player normal form games and discounted infinite-horizon Markov games. For normal form games we adopt a monotonicity-based approach allowing us to generalize previous results. We first show uniqueness and Lipschitz continuity of RQE with respect to player's payoff matrices under monotonicity assumptions, and then provide conditions on the players' degrees of risk aversion and bounded rationality that ensure monotonicity. We then focus on discounted infinite-horizon Markov games. We define the risk-averse quantal-response Bellman operator and prove its contraction under further conditions on the players' risk-aversion, bounded rationality, and temporal discounting. This yields a Q-learning based algorithm with convergence guarantees for all infinite-horizon general-sum Markov games.
Similar Papers
Generalized Quantal Response Equilibrium: Existence and Efficient Learning
CS and Game Theory
Teaches computers to play games better.
Risk-Sensitive Q-Learning in Continuous Time with Application to Dynamic Portfolio Selection
Machine Learning (CS)
Helps computers make smarter money choices safely.
Optimistic Reinforcement Learning with Quantile Objectives
Machine Learning (CS)
Teaches computers to make safer, smarter choices.