Rollout-Based Approximate Dynamic Programming for MDPs with Information-Theoretic Constraints
By: Zixuan He, Charalambos D. Charalambous, Photios A. Stavrou
Potential Business Impact:
Helps computers make better choices with less data.
This paper studies a finite-horizon Markov decision problem with information-theoretic constraints, where the goal is to minimize directed information from the controlled source process to the control process, subject to stage-wise cost constraints, aiming for an optimal control policy. We propose a new way of approximating a solution for this problem, which is known to be formulated as an unconstrained MDP with a continuous information-state using Q-factors. To avoid the computational complexity of discretizing the continuous information-state space, we propose a truncated rollout-based backward-forward approximate dynamic programming (ADP) framework. Our approach consists of two phases: an offline base policy approximation over a shorter time horizon, followed by an online rollout lookahead minimization, both supported by provable convergence guarantees. We supplement our theoretical results with a numerical example where we demonstrate the cost improvement of the rollout method compared to a previously proposed policy approximation method, and the computational complexity observed in executing the offline and online phases for the two methods.
Similar Papers
Data-Driven Abstraction and Synthesis for Stochastic Systems with Unknown Dynamics
Systems and Control
Teaches robots to learn and follow rules.
Incremental Policy Iteration for Unknown Nonlinear Systems with Stability and Performance Guarantees
Optimization and Control
Teaches robots to learn and control themselves.
Reinforcement Learning in MDPs with Information-Ordered Policies
Machine Learning (Stat)
Teaches computers to make smart choices faster.