Learning to Ball: Composing Policies for Long-Horizon Basketball Moves
By: Pei Xu , Zhen Wu , Ruocheng Wang and more
Potential Business Impact:
Teaches robots to do many complex actions.
Learning a control policy for a multi-phase, long-horizon task, such as basketball maneuvers, remains challenging for reinforcement learning approaches due to the need for seamless policy composition and transitions between skills. A long-horizon task typically consists of distinct subtasks with well-defined goals, separated by transitional subtasks with unclear goals but critical to the success of the entire task. Existing methods like the mixture of experts and skill chaining struggle with tasks where individual policies do not share significant commonly explored states or lack well-defined initial and terminal states between different phases. In this paper, we introduce a novel policy integration framework to enable the composition of drastically different motor skills in multi-phase long-horizon tasks with ill-defined intermediate states. Based on that, we further introduce a high-level soft router to enable seamless and robust transitions between the subtasks. We evaluate our framework on a set of fundamental basketball skills and challenging transitions. Policies trained by our approach can effectively control the simulated character to interact with the ball and accomplish the long-horizon task specified by real-time user commands, without relying on ball trajectory references.
Similar Papers
Curriculum Imitation Learning of Distributed Multi-Robot Policies
Robotics
Teaches robots to work together better.
Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation
Artificial Intelligence
Helps computers finish long, complicated jobs.
Few-Shot Neuro-Symbolic Imitation Learning for Long-Horizon Planning and Acting
Robotics
Teaches robots complex tasks with few examples.