Score: 1

Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography

Published: December 23, 2025 | arXiv ID: 2512.20168v1

By: Songze Li , Jiameng Cheng , Yiming Li and more

Potential Business Impact:

Hides bad messages in pictures to trick AI.

Business Areas:

Darknet Internet Services

By integrating language understanding with perceptual modalities such as images, multimodal large language models (MLLMs) constitute a critical substrate for modern AI systems, particularly intelligent agents operating in open and interactive environments. However, their increasing accessibility also raises heightened risks of misuse, such as generating harmful or unsafe content. To mitigate these risks, alignment techniques are commonly applied to align model behavior with human values. Despite these efforts, recent studies have shown that jailbreak attacks can circumvent alignment and elicit unsafe outputs. Currently, most existing jailbreak methods are tailored for open-source models and exhibit limited effectiveness against commercial MLLM-integrated systems, which often employ additional filters. These filters can detect and prevent malicious input and output content, significantly reducing jailbreak threats. In this paper, we reveal that the success of these safety filters heavily relies on a critical assumption that malicious content must be explicitly visible in either the input or the output. This assumption, while often valid for traditional LLM-integrated systems, breaks down in MLLM-integrated systems, where attackers can leverage multiple modalities to conceal adversarial intent, leading to a false sense of security in existing MLLM-integrated systems. To challenge this assumption, we propose Odysseus, a novel jailbreak paradigm that introduces dual steganography to covertly embed malicious queries and responses into benign-looking images. Extensive experiments on benchmark datasets demonstrate that our Odysseus successfully jailbreaks several pioneering and realistic MLLM-integrated systems, achieving up to 99% attack success rate. It exposes a fundamental blind spot in existing defenses, and calls for rethinking cross-modal security in MLLM-integrated systems.

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Cryptography and Security

Finds ways to trick smart AI with pictures.

24 Oct 2025 0

90%

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

Cryptography and Security

Makes AI models safer from harmful tricks.

20 Oct 2025 1

90%

Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models

Cryptography and Security

Makes AI models with pictures unsafe.

2 Jun 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

21 pages

Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography

Hides bad messages in pictures to trick AI.

Technical Abstract

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models