Model-free policy gradient for discrete-time mean-field control
By: Matthieu Meunier, Huyên Pham, Christoph Reisinger
Potential Business Impact:
Teaches computers to make smart group decisions.
We study model-free policy learning for discrete-time mean-field control (MFC) problems with finite state space and compact action space. In contrast to the extensive literature on value-based methods for MFC, policy-based approaches remain largely unexplored due to the intrinsic dependence of transition kernels and rewards on the evolving population state distribution, which prevents the direct use of likelihood-ratio estimators of policy gradients from classical single-agent reinforcement learning. We introduce a novel perturbation scheme on the state-distribution flow and prove that the gradient of the resulting perturbed value function converges to the true policy gradient as the perturbation magnitude vanishes. This construction yields a fully model-free estimator based solely on simulated trajectories and an auxiliary estimate of the sensitivity of the state distribution. Building on this framework, we develop MF-REINFORCE, a model-free policy gradient algorithm for MFC, and establish explicit quantitative bounds on its bias and mean-squared error. Numerical experiments on representative mean-field control tasks demonstrate the effectiveness of the proposed approach.
Similar Papers
Learning Mean-Field Games through Mean-Field Actor-Critic Flow
Optimization and Control
Teaches computers to make smart group decisions.
Convergence Rates of Time Discretization in Extended Mean Field Control
Optimization and Control
Makes complex robot decisions faster and more accurate.
Robust Mean Field Social Control: A Unified Reinforcement Learning Framework
Systems and Control
Teaches robots to learn without knowing everything.