Personal Attribute Leakage in Federated Speech Models
By: Hamdan Al-Ali , Ali Reza Ghavamipour , Tommaso Caselli and more
Potential Business Impact:
Lets computers guess your age, gender, or accent.
Federated learning is a common method for privacy-preserving training of machine learning models. In this paper, we analyze the vulnerability of ASR models to attribute inference attacks in the federated setting. We test a non-parametric white-box attack method under a passive threat model on three ASR models: Wav2Vec2, HuBERT, and Whisper. The attack operates solely on weight differentials without access to raw speech from target speakers. We demonstrate attack feasibility on sensitive demographic and clinical attributes: gender, age, accent, emotion, and dysarthria. Our findings indicate that attributes that are underrepresented or absent in the pre-training data are more vulnerable to such inference attacks. In particular, information about accents can be reliably inferred from all models. Our findings expose previously undocumented vulnerabilities in federated ASR models and offer insights towards improved security.
Similar Papers
An Efficient Gradient-Based Inference Attack for Federated Learning
Machine Learning (CS)
Finds private data hidden in shared learning updates.
Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses
Machine Learning (CS)
Protects private data from sneaky computer guesses.
FedAU2: Attribute Unlearning for User-Level Federated Recommender Systems with Adaptive and Robust Adversarial Training
Information Retrieval
Keeps your private info safe in recommendation apps.