Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting
By: Lars van der Laan, Nathan Kallus
Fitted Q-evaluation (FQE) is a central method for off-policy evaluation in reinforcement learning, but it generally requires Bellman completeness: that the hypothesis class is closed under the evaluation Bellman operator. This requirement is challenging because enlarging the hypothesis class can worsen completeness. We show that the need for this assumption stems from a fundamental norm mismatch: the Bellman operator is gamma-contractive under the stationary distribution of the target policy, whereas FQE minimizes Bellman error under the behavior distribution. We propose a simple fix: reweight each regression step using an estimate of the stationary density ratio, thereby aligning FQE with the norm in which the Bellman operator contracts. This enables strong evaluation guarantees in the absence of realizability or Bellman completeness, avoiding the geometric error blow-up of standard FQE in this setting while maintaining the practicality of regression-based evaluation.
Similar Papers
Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning
Machine Learning (CS)
Teaches computers to make better choices faster.
Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q's
Methodology
Helps online systems learn from changes better.
Generalized Quantal Response Equilibrium: Existence and Efficient Learning
CS and Game Theory
Teaches computers to play games better.