On the Convergence of the Policy Iteration for Infinite-Horizon Nonlinear Optimal Control Problems
By: Tobias Ehring, Behzad Azmi, Bernard Haasdonk
Potential Business Impact:
Makes robots learn better and faster.
Policy iteration (PI) is a widely used algorithm for synthesizing optimal feedback control policies across many engineering and scientific applications. When PI is deployed on infinite-horizon, nonlinear, autonomous optimal-control problems, however, a number of significant theoretical challenges emerge - particularly when the computational state space is restricted to a bounded domain. In this paper, we investigate these challenges and show that the viability of PI in this setting hinges on the existence, uniqueness, and regularity of solutions to the Generalized Hamilton-Jacobi-Bellman (GHJB) equation solved at each iteration. To ensure a well-posed iterative scheme, the GHJB solution must possess sufficient smoothness, and the domain on which the GHJB equation is solved must remain forward-invariant under the closed-loop dynamics induced by the current policy. Although fundamental to the method's convergence, previous studies have largely overlooked these aspects. This paper closes that gap by introducing a constructive procedure that guarantees forward invariance of the computational domain throughout the entire PI sequence and by establishing sufficient conditions under which a suitably regular GHJB solution exists at every iteration. Numerical results are presented for a grid-based implementation of PI to support the theoretical findings.
Similar Papers
Policy iteration for nonconvex viscous Hamilton--Jacobi equations
Numerical Analysis
Makes AI learn faster by improving how it thinks.
Solving nonconvex Hamilton--Jacobi--Isaacs equations with PINN-based policy iteration
Numerical Analysis
Helps robots plan paths around moving obstacles.
Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence
Optimization and Control
Teaches computers to make best choices faster.