Reinforcement Learning From State and Temporal Differences
By: Lex Weaver, Jonathan Baxter
Potential Business Impact:
Teaches computers to make better decisions.
TD($λ$) with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD($λ$) has been shown to minimise the squared error between the approximate value of each state and the true value. However, as far as policy is concerned, it is error in the relative ordering of states that is critical, rather than error in the state values. We illustrate this point, both in simple two-state and three-state systems in which TD($λ$)--starting from an optimal policy--converges to a sub-optimal policy, and also in backgammon. We then present a modified form of TD($λ$), called STD($λ$), in which function approximators are trained with respect to relative state values on binary decision problems. A theoretical analysis, including a proof of monotonic policy improvement for STD($λ$) in the context of the two-state system, is presented, along with a comparison with Bertsekas' differential training method [1]. This is followed by successful demonstrations of STD($λ$) on the two-state system and a variation on the well known acrobot problem.
Similar Papers
First-order Sobolev Reinforcement Learning
Machine Learning (CS)
Teaches computers to learn faster and more reliably.
Accelerated Distributional Temporal Difference Learning with Linear Function Approximation
Machine Learning (Stat)
Learns how good choices are faster with less data.
Scalable Policy-Based RL Algorithms for POMDPs
Machine Learning (CS)
Helps robots learn by remembering past actions.