Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization
By: Sayambhu Sen, Shalabh Bhatnagar
Potential Business Impact:
Teaches robots to copy experts faster.
Learning complex policies with Reinforcement Learning (RL) is often hindered by instability and slow convergence, a problem exacerbated by the difficulty of reward engineering. Imitation Learning (IL) from expert demonstrations bypasses this reliance on rewards. However, state-of-the-art IL methods, exemplified by Generative Adversarial Imitation Learning (GAIL)Ho et. al, suffer from severe sample inefficiency. This is a direct consequence of their foundational on-policy algorithms, such as TRPO Schulman et.al. In this work, we introduce an adversarial imitation learning algorithm that incorporates off-policy learning to improve sample efficiency. By combining an off-policy framework with auxiliary techniques specifically, double Q network based stabilization and value learning without reward function inference we demonstrate a reduction in the samples required to robustly match expert behavior.
Similar Papers
Limitation Learning: Catching Adverse Dialog with GAIL
Computation and Language
Teaches computers to talk like people.
Learning Dolly-In Filming From Demonstration Using a Ground-Based Robot
Robotics
Robot cameras learn to film like humans.
Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning
Machine Learning (CS)
Teaches robots to learn from watching, not just doing.