Learning Robust Social Strategies with Large Language Models
By: Dereck Piche , Mohammed Muqeeth , Milad Aghajohari and more
Potential Business Impact:
Teaches AI to work together, not cheat.
As agentic AI becomes more widespread, agents with distinct and possibly conflicting goals will interact in complex ways. These multi-agent interactions pose a fundamental challenge, particularly in social dilemmas, where agents' individual incentives can undermine collective welfare. While reinforcement learning (RL) has been effective for aligning large language models (LLMs) in the single-agent regime, prior small-network results suggest that standard RL in multi-agent settings often converges to defecting, self-interested policies. We show the same effect in LLMs: despite cooperative priors, RL-trained LLM agents develop opportunistic behavior that can exploit even advanced closed-source models. To address this tendency of RL to converge to poor equilibria, we adapt a recent opponent-learning awareness algorithm, Advantage Alignment, to fine-tune LLMs toward multi-agent cooperation and non-exploitability. We then introduce a group-relative baseline that simplifies advantage computation in iterated games, enabling multi-agent training at LLM scale. We also contribute a novel social dilemma environment, Trust and Split, which requires natural language communication to achieve high collective welfare. Across a wide range of social dilemmas, policies learned with Advantage Alignment achieve higher collective payoffs while remaining robust against exploitation by greedy agents.
Similar Papers
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Computation and Language
Teaches AI to learn and solve problems better.
Learning to Make Friends: Coaching LLM Agents toward Emergent Social Ties
Artificial Intelligence
Computers learn to act like people online.
Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation
Computation and Language
Helps AI understand and work with people.