RoboSSM: Scalable In-context Imitation Learning via State-Space Models
By: Youngju Yoo , Jiaheng Hu , Yifeng Zhu and more
Potential Business Impact:
Robots learn new tasks from fewer examples.
In-context imitation learning (ICIL) enables robots to learn tasks from prompts consisting of just a handful of demonstrations. By eliminating the need for parameter updates at deployment time, this paradigm supports few-shot adaptation to novel tasks. However, recent ICIL methods rely on Transformers, which have computational limitations and tend to underperform when handling longer prompts than those seen during training. In this work, we introduce RoboSSM, a scalable recipe for in-context imitation learning based on state-space models (SSM). Specifically, RoboSSM replaces Transformers with Longhorn -- a state-of-the-art SSM that provides linear-time inference and strong extrapolation capabilities, making it well-suited for long-context prompts. We evaluate our approach on the LIBERO benchmark and compare it against strong Transformer-based ICIL baselines. Experiments show that RoboSSM extrapolates effectively to varying numbers of in-context demonstrations, yields high performance on unseen tasks, and remains robust in long-horizon scenarios. These results highlight the potential of SSMs as an efficient and scalable backbone for ICIL. Our code is available at https://github.com/youngjuY/RoboSSM.
Similar Papers
CodeSSM: Towards State Space Models for Code Understanding
Software Engineering
Helps computers understand code better, faster, cheaper.
Robust Instant Policy: Leveraging Student's t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation
Robotics
Helps robots learn tasks from watching humans.
Leveraging State Space Models in Long Range Genomics
Genomics
Helps computers understand long DNA codes better.