Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning
By: Sijia li , Xinran Li , Shibo Chen and more
Potential Business Impact:
Helps AI teams learn better from past games.
Offline multi-agent reinforcement learning (MARL) aims to solve cooperative decision-making problems in multi-agent systems using pre-collected datasets. Existing offline MARL methods primarily constrain training within the dataset distribution, resulting in overly conservative policies that struggle to generalize beyond the support of the data. While model-based approaches offer a promising solution by expanding the original dataset with synthetic data generated from a learned world model, the high dimensionality, non-stationarity, and complexity of multi-agent systems make it challenging to accurately estimate the transitions and reward functions in offline MARL. Given the difficulty of directly modeling joint dynamics, we propose a local-to-global (LOGO) world model, a novel framework that leverages local predictions-which are easier to estimate-to infer global state dynamics, thus improving prediction accuracy while implicitly capturing agent-wise dependencies. Using the trained world model, we generate synthetic data to augment the original dataset, expanding the effective state-action space. To ensure reliable policy learning, we further introduce an uncertainty-aware sampling mechanism that adaptively weights synthetic data by prediction uncertainty, reducing approximation error propagation to policies. In contrast to conventional ensemble-based methods, our approach requires only an additional encoder for uncertainty estimation, significantly reducing computational overhead while maintaining accuracy. Extensive experiments across 8 scenarios against 8 baselines demonstrate that our method surpasses state-of-the-art baselines on standard offline MARL benchmarks, establishing a new model-based baseline for generalizable offline multi-agent learning.
Similar Papers
From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models
Multiagent Systems
Teaches robots to work together using sight and sound.
Explaining Decentralized Multi-Agent Reinforcement Learning Policies
Artificial Intelligence
Helps people understand how AI teams work together.
Remembering the Markov Property in Cooperative MARL
Machine Learning (CS)
Teaches robots to work together by learning rules.