Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization
By: Yijia Fan , Jusheng Zhang , Jing Yang and more
Potential Business Impact:
Makes AI agents talk less, saving money.
To combat the prohibitive communication costs of ``free-for-all" multi-agent systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penalizes verbosity. Across seven reasoning benchmarks, Agent-GSPO not only achieves new state-of-the-art performance but does so with a fraction of the token consumption of existing methods. By fostering emergent strategies like ``strategic silence," our approach provides a practical blueprint for developing scalable and economically viable multi-agent systems.
Similar Papers
Group Sequence Policy Optimization
Machine Learning (CS)
Makes AI learn faster and better.
Soft Adaptive Policy Optimization
Machine Learning (CS)
Teaches AI to learn better and faster.
DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning
Computation and Language
Helps AI find answers by searching the internet.