Score: 1

From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models

Published: November 3, 2025 | arXiv ID: 2511.01310v1

By: Sureyya Akin , Kavita Srivastava , Prateek B. Kapoor and more

Potential Business Impact:

Teaches robots to work together using sight and sound.

Business Areas:

Simulation Software

Learning cooperative multi-agent policies directly from high-dimensional, multimodal sensory inputs like pixels and audio (from pixels) is notoriously sample-inefficient. Model-free Multi-Agent Reinforcement Learning (MARL) algorithms struggle with the joint challenge of representation learning, partial observability, and credit assignment. To address this, we propose a novel framework based on a shared, generative Multimodal World Model (MWM). Our MWM is trained to learn a compressed latent representation of the environment's dynamics by fusing distributed, multimodal observations from all agents using a scalable attention-based mechanism. Subsequently, we leverage this learned MWM as a fast, "imagined" simulator to train cooperative MARL policies (e.g., MAPPO) entirely within its latent space, decoupling representation learning from policy learning. We introduce a new set of challenging multimodal, multi-agent benchmarks built on a 3D physics simulator. Our experiments demonstrate that our MWM-MARL framework achieves orders-of-magnitude greater sample efficiency compared to state-of-the-art model-free MARL baselines. We further show that our proposed multimodal fusion is essential for task success in environments with sensory asymmetry and that our architecture provides superior robustness to sensor-dropout, a critical feature for real-world deployment.

Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning

Artificial Intelligence

Helps AI teams learn better from past games.

12 Jan 2026 1

91%

Remembering the Markov Property in Cooperative MARL

Machine Learning (CS)

Teaches robots to work together by learning rules.

24 Jul 2025 1

91%

LAMARL: LLM-Aided Multi-Agent Reinforcement Learning for Cooperative Policy Generation

Robotics

Robots learn tasks faster with AI help.

2 Jun 2025 0

View PDF Login to Bookmark

Page Count

12 pages

From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models

Teaches robots to work together using sight and sound.

Technical Abstract

Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning

Remembering the Markov Property in Cooperative MARL

LAMARL: LLM-Aided Multi-Agent Reinforcement Learning for Cooperative Policy Generation