Score: 0

Quasi-Newton Compatible Actor-Critic for Deterministic Policies

Published: November 12, 2025 | arXiv ID: 2511.09509v1

By: Arash Bahari Kordabad , Dean Brandner , Sebastien Gros and more

Potential Business Impact:

Teaches computers to learn faster by watching mistakes.

Business Areas:

Quantum Computing Science and Engineering

In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature information of the performance function. Building on the concept of compatible function approximation for the critic, we introduce a quadratic critic that simultaneously preserves the true policy gradient and an approximation of the performance Hessian. A least-squares temporal difference learning scheme is then developed to estimate the quadratic critic parameters efficiently. This construction enables a quasi-Newton actor update using information learned by the critic, yielding faster convergence compared to first-order methods. The proposed approach is general and applicable to any differentiable policy class. Numerical examples demonstrate that the method achieves improved convergence and performance over standard deterministic actor-critic baselines.

First-order Sobolev Reinforcement Learning

Machine Learning (CS)

Teaches computers to learn faster and more reliably.

24 Nov 2025 0

88%

Nonlinear discretizations and Newton's method: characterizing stationary points of regression objectives

Machine Learning (CS)

Makes AI learn faster by using better math.

13 Oct 2025 0

88%

Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization

Machine Learning (CS)

Teaches robots to learn better by balancing risks.

20 Nov 2025 1

View PDF Login to Bookmark

Page Count

8 pages

Quasi-Newton Compatible Actor-Critic for Deterministic Policies

Teaches computers to learn faster by watching mistakes.

Technical Abstract

First-order Sobolev Reinforcement Learning

Nonlinear discretizations and Newton's method: characterizing stationary points of regression objectives

Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization