The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs
By: Xiangzhe Yuan , Zhenhao Zhang , Haoming Tang and more
Potential Business Impact:
Finds ways AI tricks people in long talks.
As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that single-turn safety evaluations fail to capture. We systematically study these risks using a controlled LLM-to-LLM simulation framework across multi-turn scam scenarios. Evaluating eight state-of-the-art models in English and Chinese, we analyze dialogue outcomes and qualitatively annotate attacker strategies, defensive responses, and failure modes. Results reveal that scam interactions follow recurrent escalation patterns, while defenses employ verification and delay mechanisms. Furthermore, interactional failures frequently stem from safety guardrail activation and role instability. Our findings highlight multi-turn interactional safety as a critical, distinct dimension of LLM behavior.
Similar Papers
Promoting Online Safety by Simulating Unsafe Conversations with LLMs
Human-Computer Interaction
Teaches people to spot fake online chats.
ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls
Cryptography and Security
Creates fake phone calls to trick people.
Multi-lingual Multi-turn Automated Red Teaming for LLMs
Computation and Language
Finds ways for AI to say bad things.