Score: 0

Improving Controller Generalization with Dimensionless Markov Decision Processes

Published: April 14, 2025 | arXiv ID: 2504.10006v1

By: Valentin Charvet, Sebastian Stein, Roderick Murray-Smith

Potential Business Impact:

Makes robots learn in new places better.

Business Areas:

Simulation Software

Controllers trained with Reinforcement Learning tend to be very specialized and thus generalize poorly when their testing environment differs from their training one. We propose a Model-Based approach to increase generalization where both world model and policy are trained in a dimensionless state-action space. To do so, we introduce the Dimensionless Markov Decision Process ($\Pi$-MDP): an extension of Contextual-MDPs in which state and action spaces are non-dimensionalized with the Buckingham-$\Pi$ theorem. This procedure induces policies that are equivariant with respect to changes in the context of the underlying dynamics. We provide a generic framework for this approach and apply it to a model-based policy search algorithm using Gaussian Process models. We demonstrate the applicability of our method on simulated actuated pendulum and cartpole systems, where policies trained on a single environment are robust to shifts in the distribution of the context.