Optimistic Task Inference for Behavior Foundation Models
By: Thomas Rupf , Marco Bagatella , Marin Vlastelica and more
Potential Business Impact:
Teaches robots new jobs with few examples.
Behavior Foundation Models (BFMs) are capable of retrieving high-performing policy for any reward function specified directly at test-time, commonly referred to as zero-shot reinforcement learning (RL). While this is a very efficient process in terms of compute, it can be less so in terms of data: as a standard assumption, BFMs require computing rewards over a non-negligible inference dataset, assuming either access to a functional form of rewards, or significant labeling efforts. To alleviate these limitations, we tackle the problem of task inference purely through interaction with the environment at test-time. We propose OpTI-BFM, an optimistic decision criterion that directly models uncertainty over reward functions and guides BFMs in data collection for task inference. Formally, we provide a regret bound for well-trained BFMs through a direct connection to upper-confidence algorithms for linear bandits. Empirically, we evaluate OpTI-BFM on established zero-shot benchmarks, and observe that it enables successor-features-based BFMs to identify and optimize an unseen reward function in a handful of episodes with minimal compute overhead. Code is available at https://github.com/ThomasRupf/opti-bfm.
Similar Papers
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
Robotics
Robot learns many jobs from one lesson.
Fast Adaptation with Behavioral Foundation Models
Machine Learning (CS)
Makes robots learn new tricks faster and better.
Behavior Foundation Model for Humanoid Robots
Robotics
Robots learn new skills without starting over.