WebInject: Prompt Injection Attack to Web Agents
By: Xilong Wang , John Bloch , Zedian Shao and more
Potential Business Impact:
Makes websites trick robots into doing bad things.
Multi-modal large language model (MLLM)-based web agents interact with webpage environments by generating actions based on screenshots of the webpages. In this work, we propose WebInject, a prompt injection attack that manipulates the webpage environment to induce a web agent to perform an attacker-specified action. Our attack adds a perturbation to the raw pixel values of the rendered webpage. After these perturbed pixels are mapped into a screenshot, the perturbation induces the web agent to perform the attacker-specified action. We formulate the task of finding the perturbation as an optimization problem. A key challenge in solving this problem is that the mapping between raw pixel values and screenshot is non-differentiable, making it difficult to backpropagate gradients to the perturbation. To overcome this, we train a neural network to approximate the mapping and apply projected gradient descent to solve the reformulated optimization problem. Extensive evaluation on multiple datasets shows that WebInject is highly effective and significantly outperforms baselines.
Similar Papers
WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
Cryptography and Security
Finds ways hackers trick web helpers.
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
CV and Pattern Recognition
Tricks smart AI into doing bad things.
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Cryptography and Security
AI helpers can be tricked by simple tricks.