Centralized Permutation Equivariant Policy for Cooperative Multi-Agent Reinforcement Learning
By: Zhuofan Xu , Benedikt Bollig , Matthias Függer and more
Potential Business Impact:
Helps many robots learn to work together better.
The Centralized Training with Decentralized Execution (CTDE) paradigm has gained significant attention in multi-agent reinforcement learning (MARL) and is the foundation of many recent algorithms. However, decentralized policies operate under partial observability and often yield suboptimal performance compared to centralized policies, while fully centralized approaches typically face scalability challenges as the number of agents increases. We propose Centralized Permutation Equivariant (CPE) learning, a centralized training and execution framework that employs a fully centralized policy to overcome these limitations. Our approach leverages a novel permutation equivariant architecture, Global-Local Permutation Equivariant (GLPE) networks, that is lightweight, scalable, and easy to implement. Experiments show that CPE integrates seamlessly with both value decomposition and actor-critic methods, substantially improving the performance of standard CTDE algorithms across cooperative benchmarks including MPE, SMAC, and RWARE, and matching the performance of state-of-the-art RWARE implementations.
Similar Papers
Multi-Agent Guided Policy Optimization
Artificial Intelligence
Helps many robots learn to work together better.
Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control
Multiagent Systems
Makes traffic lights smarter for smoother driving.
Multi-Agent Cross-Entropy Method with Monotonic Nonlinear Critic Decomposition
Machine Learning (CS)
Helps robot teams learn to work together better.