DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play
By: Akash Karthikeyan, Yash Vardhan Pant
Potential Business Impact:
Teaches game players to beat any opponent.
Self-play reinforcement learning has demonstrated significant success in learning complex strategic and interactive behaviors in competitive multi-agent games. However, achieving such behaviors in continuous decision spaces remains challenging. Ensuring adaptability and generalization in self-play settings is critical for achieving competitive performance in dynamic multi-agent environments. These challenges often cause methods to converge slowly or fail to converge at all to a Nash equilibrium, making agents vulnerable to strategic exploitation by unseen opponents. To address these challenges, we propose DiffFP, a fictitious play (FP) framework that estimates the best response to unseen opponents while learning a robust and multimodal behavioral policy. Specifically, we approximate the best response using a diffusion policy that leverages generative modeling to learn adaptive and diverse strategies. Through empirical evaluation, we demonstrate that the proposed FP framework converges towards $ε$-Nash equilibria in continuous- space zero-sum games. We validate our method on complex multi-agent environments, including racing and multi-particle zero-sum games. Simulation results show that the learned policies are robust against diverse opponents and outperform baseline reinforcement learning policies. Our approach achieves up to 3$\times$ faster convergence and 30$\times$ higher success rates on average against RL-based baselines, demonstrating its robustness to opponent strategies and stability across training iterations
Similar Papers
Aggregate Fictitious Play for Learning in Anonymous Polymatrix Games (Extended Version)
CS and Game Theory
Helps computer players learn faster in games.
Tie-breaking Agnostic Lower Bound for Fictitious Play
CS and Game Theory
Makes game learning slower than thought.
Flexible Locomotion Learning with Diffusion Model Predictive Control
Robotics
Robots learn to walk and change how they move.