Score: 1

Moderate Actor-Critic Methods: Controlling Overestimation Bias via Expectile Loss

Published: April 14, 2025 | arXiv ID: 2504.09929v1

By: Ukjo Hwang, Songnam Hong

Potential Business Impact:

Fixes computer learning mistakes for better results.

Business Areas:

A/B Testing Data and Analytics

Overestimation is a fundamental characteristic of model-free reinforcement learning (MF-RL), arising from the principles of temporal difference learning and the approximation of the Q-function. To address this challenge, we propose a novel moderate target in the Q-function update, formulated as a convex optimization of an overestimated Q-function and its lower bound. Our primary contribution lies in the efficient estimation of this lower bound through the lower expectile of the Q-value distribution conditioned on a state. Notably, our moderate target integrates seamlessly into state-of-the-art (SOTA) MF-RL algorithms, including Deep Deterministic Policy Gradient (DDPG) and Soft Actor Critic (SAC). Experimental results validate the effectiveness of our moderate target in mitigating overestimation bias in DDPG, SAC, and distributional RL algorithms.

Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization

Machine Learning (CS)

Teaches robots to learn better by balancing risks.

20 Nov 2025 1

87%

Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

Machine Learning (CS)

Teaches computers to make good choices even with bad info.

8 Jun 2025 0

86%

Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions

Mathematical Finance

Teaches computers to trade money safely and smartly.

7 May 2025 1

View PDF Login to Bookmark

Page Count

14 pages

Moderate Actor-Critic Methods: Controlling Overestimation Bias via Expectile Loss

Fixes computer learning mistakes for better results.

Technical Abstract

Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization

Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions