Score: 0

Inference of Altruism and Intrinsic Rewards in Multi-Agent Systems

Published: September 9, 2025 | arXiv ID: 2509.07650v2

By: Victor Villin, Christos Dimitrakakis

Potential Business Impact:

Teaches robots to understand and act with feelings.

Business Areas:

Artificial Intelligence Artificial Intelligence, Data and Analytics, Science and Engineering, Software

Human interactions are influenced by emotions, temperament, and affection, often conflicting with individuals' underlying preferences. Without explicit knowledge of those preferences, judging whether behaviour is appropriate becomes guesswork, leaving us highly prone to misinterpretation. Yet, such understanding is critical if autonomous agents are to collaborate effectively with humans. We frame the problem with multi-agent inverse reinforcement learning and show that even a simple model, where agents weigh their own welfare against that of others, can cover a wide range of social behaviours. Using novel Bayesian techniques, we find that intrinsic rewards and altruistic tendencies can be reliably identified by placing agents in different groups. Crucially, this disentanglement of intrinsic motivation from altruism enables the synthesis of new behaviours aligned with any desired level of altruism, even when demonstrations are drawn from restricted behaviour profiles.