Universal Approximation Theorem of Deep Q-Networks
By: Qian Qi
Potential Business Impact:
Makes AI learn better from continuous data.
We establish a continuous-time framework for analyzing Deep Q-Networks (DQNs) via stochastic control and Forward-Backward Stochastic Differential Equations (FBSDEs). Considering a continuous-time Markov Decision Process (MDP) driven by a square-integrable martingale, we analyze DQN approximation properties. We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability, leveraging residual network approximation theorems and large deviation bounds for the state-action process. We then analyze the convergence of a general Q-learning algorithm for training DQNs in this setting, adapting stochastic approximation theorems. Our analysis emphasizes the interplay between DQN layer count, time discretization, and the role of viscosity solutions (primarily for the value function $V^*$) in addressing potential non-smoothness of the optimal Q-function. This work bridges deep reinforcement learning and stochastic control, offering insights into DQNs in continuous-time settings, relevant for applications with physical systems or high-frequency data.
Similar Papers
Universal Approximation Theorem for Deep Q-Learning via FBSDE System
Machine Learning (CS)
Makes AI learn better by copying how problems are solved.
Approximation to Deep Q-Network by Stochastic Delay Differential Equations
Machine Learning (CS)
Makes computer learning more stable and predictable.
Universal approximation property of neural stochastic differential equations
Probability
Neural networks can copy complex math problems.