Rate optimal learning of equilibria from data
By: Till Freihaut , Luca Viano , Emanuele Nevali and more
Potential Business Impact:
Teaches robots to learn faster by watching.
We close open theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity. In the non-interactive setting, we prove a statistical lower bound that identifies the all-policy deviation concentrability coefficient as the fundamental complexity measure, and we show that Behavior Cloning (BC) is rate-optimal. For the interactive setting, we introduce a framework that combines reward-free reinforcement learning with interactive MAIL and instantiate it with an algorithm, MAIL-WARM. It improves the best previously known sample complexity from $\mathcal{O}(\varepsilon^{-8})$ to $\mathcal{O}(\varepsilon^{-2}),$ matching the dependence on $\varepsilon$ implied by our lower bound. Finally, we provide numerical results that support our theory and illustrate, in environments such as grid worlds, where Behavior Cloning fails to learn.
Similar Papers
Distributionally Robust Online Markov Game with Linear Function Approximation
Machine Learning (Stat)
Helps robots learn real-world tasks from practice.
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Machine Learning (CS)
Teaches robots to learn from watching, faster.
Learning Closed-Loop Parametric Nash Equilibria of Multi-Agent Collaborative Field Coverage
Multiagent Systems
Teaches robots to cover areas much faster.