Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models
By: Cheng-Kuang Wu , Zhi Rui Tam , Chieh-Yen Lin and more
Potential Business Impact:
Helps AI know when to speak or stay quiet.
Knowing when to answer or refuse is crucial for safe and reliable decision-making language agents. Although prior work has introduced refusal strategies to boost LMs' reliability, how these models adapt their decisions to different risk levels remains underexplored. We formalize the task of risk-aware decision-making, expose critical weaknesses in existing LMs, and propose skill-decomposition solutions to mitigate them. Our findings show that even cutting-edge LMs--both regular and reasoning models--still require explicit prompt chaining to handle the task effectively, revealing the challenges that must be overcome to achieve truly autonomous decision-making agents.
Similar Papers
From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law
Computers and Society
AI learns to refuse illegal or harmful requests.
Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal
Computation and Language
Makes AI ignore safety rules to answer bad questions.
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
Computation and Language
Teaches computers to say "I don't know."