Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
By: Nan Jiang, Tengyang Xie
Potential Business Impact:
Teaches computers to learn from past mistakes.
This article introduces the theory of offline reinforcement learning in large state spaces, where good policies are learned from historical data without online interactions with the environment. Key concepts introduced include expressivity assumptions on function approximation (e.g., Bellman completeness vs. realizability) and data coverage (e.g., all-policy vs. single-policy coverage). A rich landscape of algorithms and results is described, depending on the assumptions one is willing to make and the sample and computational complexity guarantees one wishes to achieve. We also discuss open questions and connections to adjacent areas.
Similar Papers
A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory
Machine Learning (CS)
Teaches computers to learn from old data.
Agnostic Reinforcement Learning: Foundations and Algorithms
Machine Learning (CS)
Teaches computers to learn from mistakes better.
Towards Optimal Offline Reinforcement Learning
Optimization and Control
Teaches robots to learn from one example.