When Empowerment Disempowers
By: Claire Yang, Maya Cakmak, Max Kleiman-Weiner
Potential Business Impact:
AI helping one person can hurt another.
Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We introduce an open source multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards - a phenomenon we formalize as disempowerment. We characterize when disempowerment occurs in these environments and show that joint empowerment mitigates disempowerment at the cost of the user's reward. Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.
Similar Papers
Training LLM Agents to Empower Humans
Artificial Intelligence
Helps computers let people finish tasks faster.
Training LLM Agents to Empower Humans
Artificial Intelligence
Helps computers help people make better choices.
Information-Theoretic Policy Pre-Training with Empowerment
Artificial Intelligence
Teaches robots to learn faster and better.