Score: 1

One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image

Published: April 2, 2025 | arXiv ID: 2504.02132v2

By: Ezzeldin Shereen , Dan Ristea , Shae McFadden and more

Potential Business Impact:

Makes AI lie by tricking its memory.

Business Areas:

Visual Search Internet Services

Multi-modal retrieval augmented generation (M-RAG) is instrumental for inhibiting hallucinations in large multi-modal models (LMMs) through the use of a factual knowledge base (KB). However, M-RAG introduces new attack vectors for adversaries that aim to disrupt the system by injecting malicious entries into the KB. In this paper, we present the first poisoning attack against M-RAG targeting visual document retrieval applications where the KB contains images of document pages. We propose two attacks, each of which require injecting only a single adversarial image into the KB. Firstly, we propose a universal attack that, for any potential user query, influences the response to cause a denial-of-service (DoS) in the M-RAG system. Secondly, we present a targeted attack against one or a group of user queries, with the goal of spreading targeted misinformation. For both attacks, we use a multi-objective gradient-based adversarial approach to craft the injected image while optimizing for both retrieval and generation. We evaluate our attacks against several visual document retrieval datasets, a diverse set of state-of-the-art retrievers (embedding models) and generators (LMMs), demonstrating the attack effectiveness in both the universal and targeted settings. We additionally present results including commonly used defenses, various attack hyper-parameter settings, ablations, and attack transferability.

Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation

Cryptography and Security

Makes AI show wrong answers by tricking its memory.

8 Mar 2025 2

92%

Practical Poisoning Attacks against Retrieval-Augmented Generation

Cryptography and Security

Makes AI smarter and harder to trick.

4 Apr 2025 1

92%

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation

CV and Pattern Recognition

Tricks AI into giving wrong answers with hidden image changes.

19 Nov 2025 0

View PDF Login to Bookmark

Page Count

19 pages

One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image

Makes AI lie by tricking its memory.

Technical Abstract

Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation

Practical Poisoning Attacks against Retrieval-Augmented Generation

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation