Score: 0

Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning

Published: June 8, 2025 | arXiv ID: 2506.07134v1

By: Eshwar S. R. , Gugan Thoppe , Aditya Gopalan and more

Potential Business Impact:

Makes learning robots more reliable and predictable.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Despite decades of research, it remains challenging to correctly use Reinforcement Learning (RL) algorithms with function approximation. A prime example is policy iteration, whose fundamental guarantee of monotonic improvement collapses even under linear function approximation. To address this issue, we introduce Reliable Policy Iteration (RPI). It replaces the common projection or Bellman-error minimization during policy evaluation with a Bellman-based constrained optimization. We prove that not only does RPI confer textbook monotonicity on its value estimates but these estimates also lower bound the true return. Also, their limit partially satisfies the unprojected Bellman equation, emphasizing RPI's natural fit within RL. RPI is the first algorithm with such monotonicity and convergence guarantees under function approximation. For practical use, we provide a model-free variant of RPI that amounts to a novel critic. It can be readily integrated into primary model-free PI implementations such as DQN and DDPG. In classical control tasks, such RPI-enhanced variants consistently maintain their lower-bound guarantee while matching or surpassing the performance of all baseline methods.

Reliable Policy Iteration: Performance Robustness Across Architecture and Environment Perturbations

Artificial Intelligence

Makes robots learn faster and more reliably.

12 Dec 2025 2

87%

Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning

Systems and Control

Teaches computers to follow rules for better learning.

16 Jun 2025 1

87%

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Machine Learning (CS)

Teaches computers to learn from experience faster.

17 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇳 India

Page Count

19 pages

Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning

Makes learning robots more reliable and predictable.

Technical Abstract

Reliable Policy Iteration: Performance Robustness Across Architecture and Environment Perturbations

Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations