Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization
By: Dong Qiu, Duo Xu, Limengxi Yue
Large Language Models (LLMs) perform well in language tasks but often lack collaborative awareness and struggle to optimize global performance in multi-agent settings. We present a reinforcement learning-augmented LLM agent framework that formulates cooperation as a decentralized partially observable Markov decision process (Dec-POMDP) and adopts centralized training with decentralized execution (CTDE). We introduce Group Relative Policy Optimization (GRPO) to jointly optimize agent policies with access to global signals during training, together with a simplified joint reward that balances task quality, speed, and coordination cost. On collaborative writing and coding benchmarks, our framework delivers a 3x increase in task processing speed over single-agent baselines, 98.7% structural/style consistency in writing, and a 74.6% test pass rate in coding. The approach consistently outperforms strong multi-agent LLM baselines and provides a practical path toward reliable collaboration in complex workflows.
Similar Papers
LLM Collaboration With Multi-Agent Reinforcement Learning
Artificial Intelligence
Helps AI agents work together to write and code.
Reinforced Language Models for Sequential Decision Making
Computation and Language
Makes small AI models learn better than big ones.
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs
Machine Learning (CS)
Teaches AI to work together better for harder tasks.