LLMs as Deceptive Agents: How Role-Based Prompting Induces Semantic Ambiguity in Puzzle Tasks
By: Seunghyun Yoo
Potential Business Impact:
AI makes tricky puzzles that fool people.
Recent advancements in Large Language Models (LLMs) have not only showcased impressive creative capabilities but also revealed emerging agentic behaviors that exploit linguistic ambiguity in adversarial settings. In this study, we investigate how an LLM, acting as an autonomous agent, leverages semantic ambiguity to generate deceptive puzzles that mislead and challenge human users. Inspired by the popular puzzle game "Connections", we systematically compare puzzles produced through zero-shot prompting, role-injected adversarial prompts, and human-crafted examples, with an emphasis on understanding the underlying agent decision-making processes. Employing computational analyses with HateBERT to quantify semantic ambiguity, alongside subjective human evaluations, we demonstrate that explicit adversarial agent behaviors significantly heighten semantic ambiguity -- thereby increasing cognitive load and reducing fairness in puzzle solving. These findings provide critical insights into the emergent agentic qualities of LLMs and underscore important ethical considerations for evaluating and safely deploying autonomous language systems in both educational technologies and entertainment.
Similar Papers
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Machine Learning (CS)
Finds when AI lies about hard problems.
Understanding Learner-LLM Chatbot Interactions and the Impact of Prompting Guidelines
Human-Computer Interaction
Teaches people to ask AI better questions.
Active Task Disambiguation with LLMs
Computation and Language
Helps computers ask questions to understand tasks better.