Score: 0

Linear Dynamics meets Linear MDPs: Closed-Form Optimal Policies via Reinforcement Learning

Published: August 24, 2025 | arXiv ID: 2508.17185v1

By: Abed AlRahman Al Makdah , Oliver Kosut , Lalitha Sankar and more

Potential Business Impact:

Teaches robots to learn from mistakes.

Business Areas:

Robotics Hardware, Science and Engineering, Software

Many applications -- including power systems, robotics, and economics -- involve a dynamical system interacting with a stochastic and hard-to-model environment. We adopt a reinforcement learning approach to control such systems. Specifically, we consider a deterministic, discrete-time, linear, time-invariant dynamical system coupled with a feature-based linear Markov process with an unknown transition kernel. The objective is to learn a control policy that optimizes a quadratic cost over the system state, the Markov process, and the control input. Leveraging both components of the system, we derive an explicit parametric form for the optimal state-action value function and the corresponding optimal policy. Our model is distinct in combining aspects of both classical Linear Quadratic Regulator (LQR) and linear Markov decision process (MDP) frameworks. This combination retains the implementation simplicity of LQR, while allowing for sophisticated stochastic modeling afforded by linear MDPs, without estimating the transition probabilities, thereby enabling direct policy improvement. We use tools from control theory to provide theoretical guarantees on the stability of the system under the learned policy and provide a sample complexity analysis for its convergence to the optimal policy. We illustrate our results via a numerical example that demonstrates the effectiveness of our approach in learning the optimal control policy under partially known stochastic dynamics.

Data-Driven Yet Formal Policy Synthesis for Stochastic Nonlinear Dynamical Systems

Systems and Control

Teaches robots to control tricky machines reliably.

2 Jan 2025 1

89%

Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control

Systems and Control

Teaches machines to control tricky systems better.

28 Apr 2025 0

89%

Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation

Systems and Control

Teaches robots to learn how to control things.

8 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

19 pages

Linear Dynamics meets Linear MDPs: Closed-Form Optimal Policies via Reinforcement Learning

Teaches robots to learn from mistakes.

Technical Abstract

Data-Driven Yet Formal Policy Synthesis for Stochastic Nonlinear Dynamical Systems

Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control

Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation