Multi-Agent Inverse Q-Learning from Demonstrations
By: Nathaniel Haynam , Adam Khoja , Dhruv Kumar and more
Potential Business Impact:
Teaches robots to work together better.
When reward functions are hand-designed, deep reinforcement learning algorithms often suffer from reward misspecification, causing them to learn suboptimal policies in terms of the intended task objectives. In the single-agent case, inverse reinforcement learning (IRL) techniques attempt to address this issue by inferring the reward function from expert demonstrations. However, in multi-agent problems, misalignment between the learned and true objectives is exacerbated due to increased environment non-stationarity and variance that scales with multiple agents. As such, in multi-agent general-sum games, multi-agent IRL algorithms have difficulty balancing cooperative and competitive objectives. To address these issues, we propose Multi-Agent Marginal Q-Learning from Demonstrations (MAMQL), a novel sample-efficient framework for multi-agent IRL. For each agent, MAMQL learns a critic marginalized over the other agents' policies, allowing for a well-motivated use of Boltzmann policies in the multi-agent context. We identify a connection between optimal marginalized critics and single-agent soft-Q IRL, allowing us to apply a direct, simple optimization criterion from the single-agent domain. Across our experiments on three different simulated domains, MAMQL significantly outperforms previous multi-agent methods in average reward, sample efficiency, and reward recovery by often more than 2-5x. We make our code available at https://sites.google.com/view/mamql .
Similar Papers
Model Predictive Adversarial Imitation Learning for Planning from Observation
Robotics
Teaches robots to plan and learn from watching.
Symmetry-Guided Multi-Agent Inverse Reinforcement Learnin
Robotics
Robots learn better with less practice.
Symmetry-Guided Multi-Agent Inverse Reinforcement Learning
Robotics
Robots learn better with fewer examples.