The Echo Chamber Multi-Turn LLM Jailbreak
By: Ahmad Alobaid , Martí Jordà Roca , Carlos Castillo and more
Potential Business Impact:
Breaks chatbot safety rules with tricky questions.
The availability of Large Language Models (LLMs) has led to a new generation of powerful chatbots that can be developed at relatively low cost. As companies deploy these tools, security challenges need to be addressed to prevent financial loss and reputational damage. A key security challenge is jailbreaking, the malicious manipulation of prompts and inputs to bypass a chatbot's safety guardrails. Multi-turn attacks are a relatively new form of jailbreaking involving a carefully crafted chain of interactions with a chatbot. We introduce Echo Chamber, a new multi-turn attack using a gradual escalation method. We describe this attack in detail, compare it to other multi-turn attacks, and demonstrate its performance against multiple state-of-the-art models through extensive evaluation.
Similar Papers
Many-Turn Jailbreaking
Computation and Language
Makes AI assistants say bad things longer.
Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models
Cryptography and Security
Stops smart AI from being tricked by bad questions.
Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models
Cryptography and Security
Makes AI models easier to trick.