Is Exploration or Optimization the Problem for Deep Reinforcement Learning?
By: Glen Berseth
Potential Business Impact:
Finds how well computer learning can improve.
In the era of deep reinforcement learning, making progress is more complex, as the collected experience must be compressed into a deep model for future exploitation and sampling. Many papers have shown that training a deep learning policy under the changing state and action distribution leads to sub-optimal performance, or even collapse. This naturally leads to the concern that even if the community creates improved exploration algorithms or reward objectives, will those improvements fall on the \textit{deaf ears} of optimization difficulties. This work proposes a new \textit{practical} sub-optimality estimator to determine optimization limitations of deep reinforcement learning algorithms. Through experiments across environments and RL algorithms, it is shown that the difference between the best experience generated is 2-3$\times$ better than the policies' learned performance. This large difference indicates that deep RL methods only exploit half of the good experience they generate.
Similar Papers
Disentangling Exploration of Large Language Models by Optimal Exploitation
Machine Learning (CS)
Helps computers learn better by exploring new things.
The Surprising Difficulty of Search in Model-Based Reinforcement Learning
Machine Learning (CS)
Makes smart computer games learn faster and better.
Exploitation Is All You Need... for Exploration
Machine Learning (CS)
Computers learn to explore new games by remembering.