LADSG: Label-Anonymized Distillation and Similar Gradient Substitution for Label Privacy in Vertical Federated Learning
By: Zeyu Yan , Yanfei Yao , Xuanbing Wen and more
Potential Business Impact:
Keeps private data safe when computers learn together.
Vertical Federated Learning (VFL) has emerged as a promising paradigm for collaborative model training across distributed feature spaces, which enables privacy-preserving learning without sharing raw data. However, recent studies have confirmed the feasibility of label inference attacks by internal adversaries. By strategically exploiting gradient vectors and semantic embeddings, attackers-through passive, active, or direct attacks-can accurately reconstruct private labels, leading to catastrophic data leakage. Existing defenses, which typically address isolated leakage vectors or are designed for specific types of attacks, remain vulnerable to emerging hybrid attacks that exploit multiple pathways simultaneously. To bridge this gap, we propose Label-Anonymized Defense with Substitution Gradient (LADSG), a unified and lightweight defense framework for VFL. LADSG first anonymizes true labels via soft distillation to reduce semantic exposure, then generates semantically-aligned substitute gradients to disrupt gradient-based leakage, and finally filters anomalous updates through gradient norm detection. It is scalable and compatible with standard VFL pipelines. Extensive experiments on six real-world datasets show that LADSG reduces the success rates of all three types of label inference attacks by 30-60% with minimal computational overhead, demonstrating its practical effectiveness.
Similar Papers
SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition
Cryptography and Security
Protects private data when computers learn together.
Cooperative Decentralized Backdoor Attacks on Vertical Federated Learning
Machine Learning (CS)
Makes AI models easier to trick with bad data.
Data Privatization in Vertical Federated Learning with Client-wise Missing Problem
Methodology
Keeps private data safe when learning from many sources.