Why AI Safety Requires Uncertainty, Incomplete Preferences, and Non-Archimedean Utilities
By: Alessio Benavoli, Alessandro Facchini, Marco Zaffalon
How can we ensure that AI systems are aligned with human values and remain safe? We can study this problem through the frameworks of the AI assistance and the AI shutdown games. The AI assistance problem concerns designing an AI agent that helps a human to maximise their utility function(s). However, only the human knows these function(s); the AI assistant must learn them. The shutdown problem instead concerns designing AI agents that: shut down when a shutdown button is pressed; neither try to prevent nor cause the pressing of the shutdown button; and otherwise accomplish their task competently. In this paper, we show that addressing these challenges requires AI agents that can reason under uncertainty and handle both incomplete and non-Archimedean preferences.
Similar Papers
Moral Responsibility or Obedience: What Do We Want from AI?
Artificial Intelligence
AI learns right from wrong, not just obey.
Distributional AGI Safety
Artificial Intelligence
Keeps groups of smart computer programs from causing harm.
Human-AI Collaboration with Misaligned Preferences
CS and Game Theory
Helps people choose better by making smart mistakes.