Can You Trust an LLM with Your Life-Changing Decision? An Investigation into AI High-Stakes Responses
By: Joshua Adrian Cahyono, Saran Subramanian
Potential Business Impact:
Makes AI ask questions before giving advice.
Large Language Models (LLMs) are increasingly consulted for high-stakes life advice, yet they lack standard safeguards against providing confident but misguided responses. This creates risks of sycophancy and over-confidence. This paper investigates these failure modes through three experiments: (1) a multiple-choice evaluation to measure model stability against user pressure; (2) a free-response analysis using a novel safety typology and an LLM Judge; and (3) a mechanistic interpretability experiment to steer model behavior by manipulating a "high-stakes" activation vector. Our results show that while some models exhibit sycophancy, others like o4-mini remain robust. Top-performing models achieve high safety scores by frequently asking clarifying questions, a key feature of a safe, inquisitive approach, rather than issuing prescriptive advice. Furthermore, we demonstrate that a model's cautiousness can be directly controlled via activation steering, suggesting a new path for safety alignment. These findings underscore the need for nuanced, multi-faceted benchmarks to ensure LLMs can be trusted with life-changing decisions.
Similar Papers
LLMs in Cybersecurity: Friend or Foe in the Human Decision Loop?
Cryptography and Security
AI helps some people make better choices.
Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution
Artificial Intelligence
Makes AI judges more honest about what they know.
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Artificial Intelligence
Makes AI doctors safer by catching bad advice.