Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment
By: Sy-Tuyen Ho , Koh Jun Hao , Ngoc-Bao Nguyen and more
Potential Business Impact:
Finds fake privacy leaks in AI.
Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to machine learning models T. To evaluate such attacks, the standard evaluation framework for such attacks relies on an evaluation model E, trained under the same task design as T. This framework has become the de facto standard for assessing progress in MI research, used across nearly all recent MI attacks and defenses without question. In this paper, we present the first in-depth study of this MI evaluation framework. In particular, we identify a critical issue of this standard MI evaluation framework: Type-I adversarial examples. These are reconstructions that do not capture the visual features of private training data, yet are still deemed successful by the target model T and ultimately transferable to E. Such false positives undermine the reliability of the standard MI evaluation framework. To address this issue, we introduce a new MI evaluation framework that replaces the evaluation model E with advanced Multimodal Large Language Models (MLLMs). By leveraging their general-purpose visual understanding, our MLLM-based framework does not depend on training of shared task design as in T, thus reducing Type-I transferability and providing more faithful assessments of reconstruction success. Using our MLLM-based evaluation framework, we reevaluate 26 diverse MI attack setups and empirically reveal consistently high false positive rates under the standard evaluation framework. Importantly, we demonstrate that many state-of-the-art (SOTA) MI methods report inflated attack accuracy, indicating that actual privacy leakage is significantly lower than previously believed. By uncovering this critical issue and proposing a robust solution, our work enables a reassessment of progress in MI research and sets a new standard for reliable and robust evaluation.
Similar Papers
An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline
Cryptography and Security
Protects secret data used to train smart computer programs.
Model Inversion Attacks on Vision-Language Models: Do They Leak What They Learn?
Machine Learning (CS)
Steals private pictures from smart AI.
Membership Inference Attacks on Large-Scale Models: A Survey
Machine Learning (CS)
Finds if your private info trained AI.