Learning POMDPs with Linear Function Approximation and Finite Memory
By: Ali Devran Kara
Potential Business Impact:
Teaches computers to make good choices with less info.
We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback policies. We provide error bounds derived from filter stability and projection errors. We then study the learning of finite-memory based near-optimal Q values. Convergence in this case requires further assumptions on the exploration policy when using general basis functions. We then show that these assumptions can be relaxed for specific models such as those with perfectly linear cost and dynamics, or when using discretization based basis functions.
Similar Papers
Scalable Policy-Based RL Algorithms for POMDPs
Machine Learning (CS)
Helps robots learn from past experiences better.
Reinforcement Learning with Function Approximation for Non-Markov Processes
Machine Learning (CS)
Teaches computers to learn from imperfect information.
Finite Memory Belief Approximation for Optimal Control in Partially Observable Markov Decision Processes
Systems and Control
Makes smart machines remember less, work better.