Improving Controller Generalization with Dimensionless Markov Decision Processes
By: Valentin Charvet, Sebastian Stein, Roderick Murray-Smith
Potential Business Impact:
Makes robots learn in new places better.
Controllers trained with Reinforcement Learning tend to be very specialized and thus generalize poorly when their testing environment differs from their training one. We propose a Model-Based approach to increase generalization where both world model and policy are trained in a dimensionless state-action space. To do so, we introduce the Dimensionless Markov Decision Process ($\Pi$-MDP): an extension of Contextual-MDPs in which state and action spaces are non-dimensionalized with the Buckingham-$\Pi$ theorem. This procedure induces policies that are equivariant with respect to changes in the context of the underlying dynamics. We provide a generic framework for this approach and apply it to a model-based policy search algorithm using Gaussian Process models. We demonstrate the applicability of our method on simulated actuated pendulum and cartpole systems, where policies trained on a single environment are robust to shifts in the distribution of the context.
Similar Papers
Generalized Linear Markov Decision Process
Machine Learning (Stat)
Helps computers learn with less reward information.
Direct transfer of optimized controllers to similar systems using dimensionless MPC
Systems and Control
Makes robot controls work on bigger machines.
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Artificial Intelligence
Teaches robots to learn from hidden rewards.