Score: 0

Rollout-Based Approximate Dynamic Programming for MDPs with Information-Theoretic Constraints

Published: September 2, 2025 | arXiv ID: 2509.02812v1

By: Zixuan He, Charalambos D. Charalambous, Photios A. Stavrou

Potential Business Impact:

Helps computers make better choices with less data.

Business Areas:

Data Center Automation Hardware, Information Technology, Software

This paper studies a finite-horizon Markov decision problem with information-theoretic constraints, where the goal is to minimize directed information from the controlled source process to the control process, subject to stage-wise cost constraints, aiming for an optimal control policy. We propose a new way of approximating a solution for this problem, which is known to be formulated as an unconstrained MDP with a continuous information-state using Q-factors. To avoid the computational complexity of discretizing the continuous information-state space, we propose a truncated rollout-based backward-forward approximate dynamic programming (ADP) framework. Our approach consists of two phases: an offline base policy approximation over a shorter time horizon, followed by an online rollout lookahead minimization, both supported by provable convergence guarantees. We supplement our theoretical results with a numerical example where we demonstrate the cost improvement of the rollout method compared to a previously proposed policy approximation method, and the computational complexity observed in executing the offline and online phases for the two methods.

Data-Driven Abstraction and Synthesis for Stochastic Systems with Unknown Dynamics

Systems and Control

Teaches robots to learn and follow rules.

21 Aug 2025 0

89%

Incremental Policy Iteration for Unknown Nonlinear Systems with Stability and Performance Guarantees

Optimization and Control

Teaches robots to learn and control themselves.

29 Aug 2025 0

88%

Reinforcement Learning in MDPs with Information-Ordered Policies

Machine Learning (Stat)

Teaches computers to make smart choices faster.

5 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇾 Cyprus

Page Count

6 pages

Rollout-Based Approximate Dynamic Programming for MDPs with Information-Theoretic Constraints

Helps computers make better choices with less data.

Technical Abstract

Data-Driven Abstraction and Synthesis for Stochastic Systems with Unknown Dynamics

Incremental Policy Iteration for Unknown Nonlinear Systems with Stability and Performance Guarantees

Reinforcement Learning in MDPs with Information-Ordered Policies