From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models
By: Sureyya Akin , Kavita Srivastava , Prateek B. Kapoor and more
Potential Business Impact:
Teaches robots to work together using sight and sound.
Learning cooperative multi-agent policies directly from high-dimensional, multimodal sensory inputs like pixels and audio (from pixels) is notoriously sample-inefficient. Model-free Multi-Agent Reinforcement Learning (MARL) algorithms struggle with the joint challenge of representation learning, partial observability, and credit assignment. To address this, we propose a novel framework based on a shared, generative Multimodal World Model (MWM). Our MWM is trained to learn a compressed latent representation of the environment's dynamics by fusing distributed, multimodal observations from all agents using a scalable attention-based mechanism. Subsequently, we leverage this learned MWM as a fast, "imagined" simulator to train cooperative MARL policies (e.g., MAPPO) entirely within its latent space, decoupling representation learning from policy learning. We introduce a new set of challenging multimodal, multi-agent benchmarks built on a 3D physics simulator. Our experiments demonstrate that our MWM-MARL framework achieves orders-of-magnitude greater sample efficiency compared to state-of-the-art model-free MARL baselines. We further show that our proposed multimodal fusion is essential for task success in environments with sensory asymmetry and that our architecture provides superior robustness to sensor-dropout, a critical feature for real-world deployment.
Similar Papers
Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning
Artificial Intelligence
Helps AI teams learn better from past games.
Remembering the Markov Property in Cooperative MARL
Machine Learning (CS)
Teaches robots to work together by learning rules.
LAMARL: LLM-Aided Multi-Agent Reinforcement Learning for Cooperative Policy Generation
Robotics
Robots learn tasks faster with AI help.