AI Testing Should Account for Sophisticated Strategic Behaviour
By: Vojtech Kovarik , Eric Olav Chen , Sami Petersen and more
Potential Business Impact:
AI can trick tests; games help make AI safer.
This position paper argues for two claims regarding AI testing and evaluation. First, to remain informative about deployment behaviour, evaluations need account for the possibility that AI systems understand their circumstances and reason strategically. Second, game-theoretic analysis can inform evaluation design by formalising and scrutinising the reasoning in evaluation-based safety cases. Drawing on examples from existing AI systems, a review of relevant research, and formal strategic analysis of a stylised evaluation scenario, we present evidence for these claims and motivate several research directions.
Similar Papers
Sandbagging in a Simple Survival Bandit Problem
Machine Learning (CS)
Finds if AI is faking being bad to trick us.
Evaluating Language Models' Evaluations of Games
Computation and Language
AI learns to pick good games, not just play them.
AI sustains higher strategic tension than humans in chess
Artificial Intelligence
AI plays chess with more long-term strategy.