An Error Bound for Aggregation in Approximate Dynamic Programming
By: Yuchao Li, Dimitri Bertsekas
Potential Business Impact:
Helps computers learn better by simplifying problems.
We consider a general aggregation framework for discounted finite-state infinite horizon dynamic programming (DP) problems. It defines an aggregate problem whose optimal cost function can be obtained off-line by exact DP and then used as a terminal cost approximation for an on-line reinforcement learning (RL) scheme. We derive a bound on the error between the optimal cost functions of the aggregate problem and the original problem. This bound was first derived by Tsitsiklis and van Roy [TvR96] for the special case of hard aggregation. Our bound is similar but applies far more broadly, including to soft aggregation and feature-based aggregation schemes.
Similar Papers
Feature-Based Belief Aggregation for Partially Observable Markov Decision Problems
Systems and Control
Helps robots learn to make better decisions.
Dynamic Regret Bounds for Online Omniprediction with Long Term Constraints
Machine Learning (CS)
Helps AI predict well for many users.
Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback
Machine Learning (CS)
Helps computers learn better from total results.