Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems
By: Utkarsh U. Chavan, Prashant Trivedi, Nandyala Hemachandra
Potential Business Impact:
Helps robots work together to finish tasks.
Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored. In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies. Our main contribution is the first regret lower bound for this setting based on the construction of hard-to-learn instances for any number of agents, $n$. Our regret lower bound of $\Omega(\sqrt{K})$, over $K$ episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.
Similar Papers
Stochastic Shortest Path with Sparse Adversarial Costs
Machine Learning (CS)
Finds faster paths when few choices cost money.
Regret Guarantees for Linear Contextual Stochastic Shortest Path
Machine Learning (CS)
Teaches computers to win games with hidden rules.
Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem
Machine Learning (CS)
Teaches computers to find the best path faster.