Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs
By: Saber Omidi , Marek Petrik , Se Young Yoon and more
Potential Business Impact:
Makes robots safer by predicting problems.
Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with high confidence throughout the uncertain evolution of the state variables. The unpredictable evolution of state variables poses a significant challenge for meeting predefined constraints using various control methods. To address this, we present a new algorithm that computes safe policies to determine the safety level across a finite state set. This algorithm reduces the safety objective to the standard average reward Markov Decision Process (MDP) objective. This reduction enables us to use standard techniques, such as linear programs, to compute and analyze safe policies. We validate the proposed method numerically on the Double Integrator and the Inverted Pendulum systems. Results indicate that the average-reward MDPs solution is more comprehensive, converges faster, and offers higher quality compared to the minimum discounted-reward solution.
Similar Papers
Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain
Optimization and Control
Teaches computers to make best long-term decisions.
Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs
Machine Learning (CS)
Unifies math for better computer learning.
Distributionally Robust Markov Games with Average Reward
Multiagent Systems
Helps teams win games even when things change.