Second-Order Policy Gradient Methods for the Linear Quadratic Regulator
By: Amirreza Valaei, Arash Bahari Kordabad, Sadegh Soudjani
Potential Business Impact:
Makes robots learn tasks much faster.
Policy gradient methods are a powerful family of reinforcement learning algorithms for continuous control that optimize a policy directly. However, standard first-order methods often converge slowly. Second-order methods can accelerate learning by using curvature information, but they are typically expensive to compute. The linear quadratic regulator (LQR) is a practical setting in which key quantities, such as the policy gradient, admit closed-form expressions. In this work, we develop second-order policy gradient algorithms for LQR by deriving explicit formulas for both the approximate and exact Hessians used in Gauss--Newton and Newton methods, respectively. Numerical experiments show a faster convergence rate for the proposed second-order approach over the standard first-order policy gradient baseline.
Similar Papers
Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches
Optimization and Control
Makes robots learn to move better and faster.
On the (almost) Global Exponential Convergence of the Overparameterized Policy Optimization for the LQR Problem
Optimization and Control
Makes computer learning faster and better.
A Quadratic Control Framework for Dynamic Systems
Systems and Control
Makes robots follow paths perfectly.