Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning
By: Tan Jing , Xiaorui Li , Chao Yao and more
Potential Business Impact:
Teaches computers to learn from old data better.
Offline reinforcement learning (RL) enables learning effective policies from fixed datasets without any environment interaction. Existing methods typically employ policy constraints to mitigate the distribution shift encountered during offline RL training. However, because the scale of the constraints varies across tasks and datasets of differing quality, existing methods must meticulously tune hyperparameters to match each dataset, which is time-consuming and often impractical. We propose Adaptive Scaling of Policy Constraints (ASPC), a second-order differentiable framework that dynamically balances RL and behavior cloning (BC) during training. We theoretically analyze its performance improvement guarantee. In experiments on 39 datasets across four D4RL domains, ASPC using a single hyperparameter configuration outperforms other adaptive constraint methods and state-of-the-art offline RL algorithms that require per-dataset tuning while incurring only minimal computational overhead. The code will be released at https://github.com/Colin-Jing/ASPC.
Similar Papers
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Robotics
Teaches robots to learn safely without breaking things.
Policy Constraint by Only Support Constraint for Offline Reinforcement Learning
Machine Learning (CS)
Teaches computers to learn better from old data.
Periodic Asynchrony: An Effective Method for Accelerating On-Policy Reinforcement Learning
Machine Learning (CS)
Makes computer learning much faster and cheaper.