LoRA as Oracle
By: Marco Arazzi, Antonino Nocera
Potential Business Impact:
Finds hidden dangers in computer brains.
Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical settings. Existing defenses for backdoor detection and membership inference typically require access to clean reference models, extensive retraining, or strong assumptions about the attack mechanism. In this work, we introduce a novel LoRA-based oracle framework that leverages low-rank adaptation modules as a lightweight, model-agnostic probe for both backdoor detection and membership inference. Our approach attaches task-specific LoRA adapters to a frozen backbone and analyzes their optimization dynamics and representation shifts when exposed to suspicious samples. We show that poisoned and member samples induce distinctive low-rank updates that differ significantly from those generated by clean or non-member data. These signals can be measured using simple ranking and energy-based statistics, enabling reliable inference without access to the original training data or modification of the deployed model.
Similar Papers
Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models
Cryptography and Security
Makes AI models secretly do bad things.
Why LoRA Fails to Forget: Regularized Low-Rank Adaptation Against Backdoors in Language Models
Computation and Language
Fixes AI that learned bad habits.
LoRA as a Flexible Framework for Securing Large Vision Systems
CV and Pattern Recognition
Fixes self-driving cars fooled by fake signs.