Score: 2

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

Published: January 6, 2026 | arXiv ID: 2601.02670v1

By: Devang Kulshreshtha , Hang Su , Chinmay Hegde and more

BigTech Affiliations: Amazon

Potential Business Impact:

Breaks AI safety rules with fewer questions.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Most jailbreak methods achieve high attack success rates (ASR) but require attacker LLMs to craft adversarial queries and/or demand high query budgets. These resource limitations make jailbreaking expensive, and the queries generated by attacker LLMs often consist of non-interpretable random prefixes. This paper introduces Lexical Anchor Tree Search (), addressing these limitations through an attacker-LLM-free method that operates purely via lexical anchor injection. LATS reformulates jailbreaking as a breadth-first tree search over multi-turn dialogues, where each node incrementally injects missing content words from the attack goal into benign prompts. Evaluations on AdvBench and HarmBench demonstrate that LATS achieves 97-100% ASR on latest GPT, Claude, and Llama models with an average of only ~6.4 queries, compared to 20+ queries required by other methods. These results highlight conversational structure as a potent and under-protected attack surface, while demonstrating superior query efficiency in an era where high ASR is readily achievable. Our code will be released to support reproducibility.

Country of Origin
πŸ‡ΊπŸ‡Έ United States

Repos / Data Links

Page Count
20 pages

Category
Computer Science:
Computation and Language