Score: 0

SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation

Published: October 29, 2025 | arXiv ID: 2510.26830v1

By: Guangzhi Su , Shuchang Huang , Yutong Ke and more

Potential Business Impact:

Stops smart AI from being tricked by fake pictures.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models remain highly vulnerable to adversarial manipulations, raising concerns about their safety and reliability in deployment. In this work, we first generalize an approach for generating adversarial images within the HuggingFace ecosystem and then introduce SmoothGuard, a lightweight and model-agnostic defense framework that enhances the robustness of MLLMs through randomized noise injection and clustering-based prediction aggregation. Our method perturbs continuous modalities (e.g., images and audio) with Gaussian noise, generates multiple candidate outputs, and applies embedding-based clustering to filter out adversarially influenced predictions. The final answer is selected from the majority cluster, ensuring stable responses even under malicious perturbations. Extensive experiments on POPE, LLaVA-Bench (In-the-Wild), and MM-SafetyBench demonstrate that SmoothGuard improves resilience to adversarial attacks while maintaining competitive utility. Ablation studies further identify an optimal noise range (0.1-0.2) that balances robustness and utility.

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

CV and Pattern Recognition

Makes AI safer from tricky pictures.

2 Apr 2025 4

88%

Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models

Machine Learning (CS)

Keeps online games safe from bad words.

5 Dec 2025 4

88%

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

Cryptography and Security

Makes AI safer from bad instructions.

27 Nov 2025 0

View PDF Login to Bookmark

Page Count

6 pages

SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation

Stops smart AI from being tricked by fake pictures.

Technical Abstract

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks