On the hardness of RL with Lookahead
By: Corentin Pla , Hugo Richard , Marc Abeille and more
Potential Business Impact:
Lets computers plan ahead to make better choices.
We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\ell \geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.
Similar Papers
Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning
Machine Learning (CS)
Teaches robots to learn faster and better.
Reinforcement Learning for Long-Horizon Multi-Turn Search Agents
Computation and Language
AI learns better by trying and failing.
Statistical and Algorithmic Foundations of Reinforcement Learning
Machine Learning (Stat)
Teaches computers to learn faster with less data.