Learning Large-Scale Competitive Team Behaviors with Mean-Field Interactions
By: Bhavini Jeloka, Yue Guan, Panagiotis Tsiotras
Potential Business Impact:
Lets many robots learn to play together.
State-of-the-art multi-agent reinforcement learning (MARL) algorithms such as MADDPG and MAAC fail to scale in situations where the number of agents becomes large. Mean-field theory has shown encouraging results in modeling macroscopic agent behavior for teams with a large number of agents through a continuum approximation of the agent population and its interaction with the environment. In this work, we extend proximal policy optimization (PPO) to the mean-field domain by introducing the Mean-Field Multi-Agent Proximal Policy Optimization (MF-MAPPO), a novel algorithm that utilizes the effectiveness of the finite-population mean-field approximation in the context of zero-sum competitive multi-agent games between two teams. The proposed algorithm can be easily scaled to hundreds and thousands of agents in each team as shown through numerical experiments. In particular, the algorithm is applied to realistic applications such as large-scale offense-defense battlefield scenarios.
Similar Papers
Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation
Robotics
Helps robots find and catch targets faster.
Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems
Machine Learning (CS)
Teaches AI groups to work better, faster.
Bi-level Mean Field: Dynamic Grouping for Large-Scale MARL
Artificial Intelligence
Helps many robots work together smarter.