Score: 0

A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections or Strong Convexity

Published: June 1, 2025 | arXiv ID: 2506.01052v2

By: Wei-Cheng Lee, Francesco Orabona

Potential Business Impact:

Teaches computers to learn without needing extra checks.

Business Areas:
Machine Learning Artificial Intelligence, Data and Analytics, Software

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in the field of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the minimal curvature of the potential function. While prior work has established convergence guarantees in this setting, these results typically rely on the assumption that each iterate is projected onto a bounded set, a condition that is both artificial and does not match the current practice. In this paper, we challenge the necessity of such an assumption and present a refined analysis of TD learning. For the first time, we show that the simple projection-free variant converges with a rate of $\widetilde{\mathcal{O}}(\frac{||\theta^*||^2_2}{\sqrt{T}})$, even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.

Country of Origin
πŸ‡ΈπŸ‡¦ Saudi Arabia

Page Count
24 pages

Category
Computer Science:
Machine Learning (CS)