Score: 0

Emergent Alignment via Competition

Published: September 18, 2025 | arXiv ID: 2509.15090v1

By: Natalie Collina , Surbhi Goel , Aaron Roth and more

Potential Business Impact:

Multiple AI can work together to help you.

Business Areas:
Artificial Intelligence Artificial Intelligence, Data and Analytics, Science and Engineering, Software

Aligning AI systems with human values remains a fundamental challenge, but does our inability to create perfectly aligned models preclude obtaining the benefits of alignment? We study a strategic setting where a human user interacts with multiple differently misaligned AI agents, none of which are individually well-aligned. Our key insight is that when the users utility lies approximately within the convex hull of the agents utilities, a condition that becomes easier to satisfy as model diversity increases, strategic competition can yield outcomes comparable to interacting with a perfectly aligned model. We model this as a multi-leader Stackelberg game, extending Bayesian persuasion to multi-round conversations between differently informed parties, and prove three results: (1) when perfect alignment would allow the user to learn her Bayes-optimal action, she can also do so in all equilibria under the convex hull condition (2) under weaker assumptions requiring only approximate utility learning, a non-strategic user employing quantal response achieves near-optimal utility in all equilibria and (3) when the user selects the best single AI after an evaluation period, equilibrium guarantees remain near-optimal without further distributional assumptions. We complement the theory with two sets of experiments.

Page Count
34 pages

Category
Computer Science:
Machine Learning (CS)