On Dynamic Programming Theory for Leader-Follower Stochastic Games
By: Jilles Steeve Dibangoye , Thibaut Le Marre , Ocan Sankur and more
Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) framework that applies Bellman recursion over credible sets-state abstractions formally representing all rational follower best responses under partial leader commitments-to compute SSEs. We first prove that any LF-GSSG admits a lossless reduction to a Markov decision process (MDP) over credible sets. We further establish that synthesising an optimal memoryless deterministic leader policy is NP-hard, motivating the development of ε-optimal DP algorithms with provable guarantees on leader exploitability. Experiments on standard mixed-motive benchmarks-including security games, resource allocation, and adversarial planning-demonstrate empirical gains in leader value and runtime scalability over state-of-the-art methods.
Similar Papers
Learning in Stackelberg Markov Games
Systems and Control
Helps set fair electricity prices for everyone.
ε-Optimally Solving Two-Player Zero-Sum POSGs
CS and Game Theory
Lets computers play games they can't fully see.
When the Correct Model Fails: The Optimality of Stackelberg Equilibria with Follower Intention Updates
Systems and Control
Helps games adapt when players change their minds.