Score: 1

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

Published: October 21, 2025 | arXiv ID: 2510.18728v1

By: Sidhant Narula , Javad Rafiei Asl , Mohammad Ghasemigol and more

Potential Business Impact:

Breaks AI's safety rules to get answers.

Business Areas:
Darknet Internet Services

Large Language Models (LLMs) remain vulnerable to multi-turn jailbreak attacks. We introduce HarmNet, a modular framework comprising ThoughtNet, a hierarchical semantic network; a feedback-driven Simulator for iterative query refinement; and a Network Traverser for real-time adaptive attack execution. HarmNet systematically explores and refines the adversarial space to uncover stealthy, high-success attack paths. Experiments across closed-source and open-source LLMs show that HarmNet outperforms state-of-the-art methods, achieving higher attack success rates. For example, on Mistral-7B, HarmNet achieves a 99.4% attack success rate, 13.9% higher than the best baseline. Index terms: jailbreak attacks; large language models; adversarial framework; query refinement.

Country of Origin
🇺🇸 United States

Page Count
3 pages

Category
Computer Science:
Cryptography and Security