Assessing User Privacy Leakage in Synthetic Packet Traces: An Attack-Grounded Approach
By: Minhao Jin, Hongyu He, Maria Apostolaki
Potential Business Impact:
Finds hidden user info in fake internet traffic.
Current synthetic traffic generators (SynNetGens) promise privacy but lack comprehensive guarantees or empirical validation, even as their fidelity steadily improves. We introduce the first attack-grounded benchmark for assessing the privacy of SynNetGens directly from the traffic they produce. We frame privacy as membership inference at the traffic-source level--a realistic and actionable threat for data holders. To this end, we present TraceBleed, the first attack that exploits behavioral fingerprints across flows using contrastive learning and temporal chunking, outperforming prior membership inference baselines by 172%. Our large-scale study across GAN-, diffusion-, and GPT-based SynNetGens uncovers critical insights: (i) SynNetGens leak user-level information; (ii) differential privacy either fails to stop these attacks or severely degrades fidelity; and (iii) sharing more synthetic data amplifies leakage by 59% on average. Finally, we introduce TracePatch, the first SynNetGen-agnostic defense that combines adversarial ML with SMT constraints to mitigate leakage while preserving fidelity.
Similar Papers
Quantifying the Privacy Implications of High-Fidelity Synthetic Network Traffic
Artificial Intelligence
Finds hidden private info in fake internet traffic.
Evaluating Privacy-Utility Tradeoffs in Synthetic Smart Grid Data
Machine Learning (CS)
Creates fake electricity use data to protect privacy.
Packet-Level DDoS Data Augmentation Using Dual-Stream Temporal-Field Diffusion
Networking and Internet Architecture
Makes fake internet attacks look real for training.