Meta-Learning in Self-Play Regret Minimization
By: David Sychrovský , Martin Schmid , Michal Šustr and more
Potential Business Impact:
Teaches computers to win many similar games faster.
Regret minimization is a general approach to online optimization which plays a crucial role in many algorithms for approximating Nash equilibria in two-player zero-sum games. The literature mainly focuses on solving individual games in isolation. However, in practice, players often encounter a distribution of similar but distinct games. For example, when trading correlated assets on the stock market, or when refining the strategy in subgames of a much larger game. Recently, offline meta-learning was used to accelerate one-sided equilibrium finding on such distributions. We build upon this, extending the framework to the more challenging self-play setting, which is the basis for most state-of-the-art equilibrium approximation algorithms for domains at scale. When selecting the strategy, our method uniquely integrates information across all decision states, promoting global communication as opposed to the traditional local regret decomposition. Empirical evaluation on normal-form games and river poker subgames shows our meta-learned algorithms considerably outperform other state-of-the-art regret minimization algorithms.
Similar Papers
Approximating Nash Equilibria in General-Sum Games via Meta-Learning
CS and Game Theory
Helps games find fair winning strategies faster.
Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond
CS and Game Theory
Helps computers learn best prices to sell things.
The Hidden Game Problem
Artificial Intelligence
Finds secret winning moves in complex games.