Score: 0

Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error

Published: June 11, 2025 | arXiv ID: 2506.09685v1

By: Armin Gießler, Albertus Johannes Malan, Sören Hohmann

Potential Business Impact:

Makes robots learn faster and better.

Business Areas:

Embedded Systems Hardware, Science and Engineering, Software

In this paper, we present a novel method for computing the optimal feedback gain of the infinite-horizon Linear Quadratic Regulator (LQR) problem via an ordinary differential equation. We introduce a novel continuous-time Bellman error, derived from the Hamilton-Jacobi-Bellman (HJB) equation, which quantifies the suboptimality of stabilizing policies and is parametrized in terms of the feedback gain. We analyze its properties, including its effective domain, smoothness, coerciveness and show the existence of a unique stationary point within the stability region. Furthermore, we derive a closed-form gradient expression of the Bellman error that induces a gradient flow. This converges to the optimal feedback and generates a unique trajectory which exclusively comprises stabilizing feedback policies. Additionally, this work advances interesting connections between LQR theory and Reinforcement Learning (RL) by redefining suboptimality of the Algebraic Riccati Equation (ARE) as a Bellman error, adapting a state-independent formulation, and leveraging Lyapunov equations to overcome the infinite-horizon challenge. We validate our method in a simulation and compare it to the state of the art.

Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation

Systems and Control

Teaches robots to learn how to control things.

8 Mar 2025 1

88%

Is Bellman Equation Enough for Learning Control?

Machine Learning (CS)

Makes smart robots learn the right way.

4 Mar 2025 1

87%

Learning-Based Stable Optimal Control for Infinite-Time Nonlinear Regulation Problems

Systems and Control

Makes robots fly safely without crashing.

12 Jun 2025 1

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

8 pages

Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error

Makes robots learn faster and better.

Technical Abstract

Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation

Is Bellman Equation Enough for Learning Control?

Learning-Based Stable Optimal Control for Infinite-Time Nonlinear Regulation Problems