Score: 0

Security Risk of Misalignment between Text and Image in Multi-modal Model

Published: October 30, 2025 | arXiv ID: 2510.26105v1

By: Xiaosen Wang, Zhijin Ge, Shaokang Wang

Potential Business Impact:

Makes AI create bad pictures even with good words.

Business Areas:

Visual Search Internet Services

Despite the notable advancements and versatility of multi-modal diffusion models, such as text-to-image models, their susceptibility to adversarial inputs remains underexplored. Contrary to expectations, our investigations reveal that the alignment between textual and Image modalities in existing diffusion models is inadequate. This misalignment presents significant risks, especially in the generation of inappropriate or Not-Safe-For-Work (NSFW) content. To this end, we propose a novel attack called Prompt-Restricted Multi-modal Attack (PReMA) to manipulate the generated content by modifying the input image in conjunction with any specified prompt, without altering the prompt itself. PReMA is the first attack that manipulates model outputs by solely creating adversarial images, distinguishing itself from prior methods that primarily generate adversarial prompts to produce NSFW content. Consequently, PReMA poses a novel threat to the integrity of multi-modal diffusion models, particularly in image-editing applications that operate with fixed prompts. Comprehensive evaluations conducted on image inpainting and style transfer tasks across various models confirm the potent efficacy of PReMA.

Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots

Cryptography and Security

Makes AI safer by tricking it less.

1 Apr 2025 2

88%

PLA: Prompt Learning Attack against Text-to-Image Generative Models

Cryptography and Security

Makes AI create forbidden pictures.

14 Jul 2025 1

87%

Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset

CV and Pattern Recognition

Stops AI from making bad pictures online.

16 Apr 2025 1

View PDF Login to Bookmark

Page Count

21 pages

Security Risk of Misalignment between Text and Image in Multi-modal Model

Makes AI create bad pictures even with good words.

Technical Abstract

Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots

PLA: Prompt Learning Attack against Text-to-Image Generative Models

Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset