SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
By: Yarden As , Chengrui Qu , Benjamin Unger and more
Potential Business Impact:
Makes robots safe when learning from simulations.
Safety remains a major concern for deploying reinforcement learning (RL) in real-world applications. Simulators provide safe, scalable training environments, but the inevitable sim-to-real gap introduces additional safety concerns, as policies must satisfy constraints in real-world conditions that differ from simulation. To address this challenge, robust safe RL techniques offer principled methods, but are often incompatible with standard scalable training pipelines. In contrast, domain randomization, a simple and popular sim-to-real technique, stands out as a promising alternative, although it often results in unsafe behaviors in practice. We present SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
Similar Papers
Safe Continual Domain Adaptation after Sim2Real Transfer of Reinforcement Learning Policies in Robotics
Robotics
Lets robots learn and change safely in the real world.
Provable Sim-to-Real Transfer via Offline Domain Randomization
Machine Learning (CS)
Teaches robots to learn from real-world mistakes.
Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training
Robotics
Teaches robots to do tasks with less real practice.