Score: 1

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

Published: October 20, 2025 | arXiv ID: 2510.17277v1

By: Xinkai Wang , Beibei Li , Zerui Shao and more

Potential Business Impact:

Makes AI models safer from harmful tricks.

Business Areas:

Corrections Facilities Privacy and Security

Multimodal large language models (MLLMs) have demonstrated significant utility across diverse real-world applications. But MLLMs remain vulnerable to jailbreaks, where adversarial inputs can collapse their safety constraints and trigger unethical responses. In this work, we investigate jailbreaks in the text-vision multimodal setting and pioneer the observation that visual alignment imposes uneven safety constraints across modalities in MLLMs, thereby giving rise to multimodal safety asymmetry. We then develop PolyJailbreak, a black-box jailbreak method grounded in reinforcement learning. Initially, we probe the model's attention dynamics and latent representation space, assessing how visual inputs reshape cross-modal information flow and diminish the model's ability to separate harmful from benign inputs, thereby exposing exploitable vulnerabilities. On this basis, we systematize them into generalizable and reusable operational rules that constitute a structured library of Atomic Strategy Primitives, which translate harmful intents into jailbreak inputs through step-wise transformations. Guided by the primitives, PolyJailbreak employs a multi-agent optimization process that automatically adapts inputs against the target models. We conduct comprehensive evaluations on a variety of open-source and closed-source MLLMs, demonstrating that PolyJailbreak outperforms state-of-the-art baselines.

Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models

Cryptography and Security

Makes AI models with pictures unsafe.

2 Jun 2025 0

93%

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Cryptography and Security

Tricks AI into showing bad stuff using pictures.

23 Oct 2025 2

93%

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Cryptography and Security

Finds ways to trick smart AI with pictures.

24 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

16 pages

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

Makes AI models safer from harmful tricks.

Technical Abstract

Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses