Counterfactual Behavior Cloning: Offline Imitation Learning from Imperfect Human Demonstrations
By: Shahabedin Sagheb, Dylan P. Losey
Potential Business Impact:
Robots learn better from human mistakes.
Learning from humans is challenging because people are imperfect teachers. When everyday humans show the robot a new task they want it to perform, humans inevitably make errors (e.g., inputting noisy actions) and provide suboptimal examples (e.g., overshooting the goal). Existing methods learn by mimicking the exact behaviors the human teacher provides -- but this approach is fundamentally limited because the demonstrations themselves are imperfect. In this work we advance offline imitation learning by enabling robots to extrapolate what the human teacher meant, instead of only considering what the human actually showed. We achieve this by hypothesizing that all of the human's demonstrations are trying to convey a single, consistent policy, while the noise and sub-optimality within their behaviors obfuscates the data and introduces unintentional complexity. To recover the underlying policy and learn what the human teacher meant, we introduce Counter-BC, a generalized version of behavior cloning. Counter-BC expands the given dataset to include actions close to behaviors the human demonstrated (i.e., counterfactual actions that the human teacher could have intended, but did not actually show). During training Counter-BC autonomously modifies the human's demonstrations within this expanded region to reach a simple and consistent policy that explains the underlying trends in the human's dataset. Theoretically, we prove that Counter-BC can extract the desired policy from imperfect data, multiple users, and teachers of varying skill levels. Empirically, we compare Counter-BC to state-of-the-art alternatives in simulated and real-world settings with noisy demonstrations, standardized datasets, and real human teachers. See videos of our work here: https://youtu.be/XaeOZWhTt68
Similar Papers
R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations
Robotics
Teaches robot teams by showing one robot at a time.
From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving
Machine Learning (CS)
Teaches self-driving cars to avoid crashes.
Residual Off-Policy RL for Finetuning Behavior Cloning Policies
Robotics
Teaches robots to do tasks better by learning from mistakes.