Score: 1

FiMMIA: scaling semantic perturbation-based membership inference across modalities

Published: December 2, 2025 | arXiv ID: 2512.02786v1

By: Anton Emelyanov, Sergei Kudriashov, Alena Fenogenova

Potential Business Impact:

Finds if private data was used to train AI.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Membership Inference Attacks (MIAs) aim to determine whether a specific data point was included in the training set of a target model. Although there are have been numerous methods developed for detecting data contamination in large language models (LLMs), their performance on multimodal LLMs (MLLMs) falls short due to the instabilities introduced through multimodal component adaptation and possible distribution shifts across multiple inputs. In this work, we investigate multimodal membership inference and address two issues: first, by identifying distribution shifts in the existing datasets, and second, by releasing an extended baseline pipeline to detect them. We also generalize the perturbation-based membership inference methods to MLLMs and release \textbf{FiMMIA} -- a modular \textbf{F}ramework for \textbf{M}ultimodal \textbf{MIA}.\footnote{The source code and framework have been made publicly available under the MIT license via \href{https://github.com/ai-forever/data_leakage_detect}{link}.The video demonstration is available on \href{https://youtu.be/a9L4-H80aSg}{YouTube}.} Our approach trains a neural network to analyze the target model's behavior on perturbed inputs, capturing distributional differences between members and non-members. Comprehensive evaluations on various fine-tuned multimodal models demonstrate the effectiveness of our perturbation-based membership inference attacks in multimodal domains.

Membership Inference Attacks on Large-Scale Models: A Survey

Machine Learning (CS)

Finds if your private info trained AI.

25 Mar 2025 1

91%

Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity

Cryptography and Security

Finds if private data was used to train AI.

12 Mar 2025 1

91%

Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models

Cryptography and Security

Finds if private images were used to train AI.

2 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇷🇺 Russian Federation

Repos / Data Links

github.com github.com

Page Count

14 pages

FiMMIA: scaling semantic perturbation-based membership inference across modalities

Finds if private data was used to train AI.

Technical Abstract

Membership Inference Attacks on Large-Scale Models: A Survey

Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity

Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models