AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs
By: Madhava Gaikwad
Potential Business Impact:
Protects smart computer brains from being copied.
Large language models are exposed to risks of extraction, distillation, and unauthorized fine-tuning. Existing defenses use watermarking or monitoring, but these act after leakage. We design AlignDP, a hybrid privacy lock that blocks knowledge transfer at the data interface. The key idea is to separate rare and non-rare fields. Rare fields are shielded by PAC indistinguishability, giving effective zero-epsilon local DP. Non-rare fields are privatized with RAPPOR, giving unbiased frequency estimates under local DP. A global aggregator enforces composition and budget. This two-tier design hides rare events and adds controlled noise to frequent events. We prove limits of PAC extension to global aggregation, give bounds for RAPPOR estimates, and analyze utility trade-off. A toy simulation confirms feasibility: rare categories remain hidden, frequent categories are recovered with small error.
Similar Papers
Improved Algorithms for Differentially Private Language Model Alignment
Cryptography and Security
Keeps AI helpful and private.
DP-GENG : Differentially Private Dataset Distillation Guided by DP-Generated Data
Cryptography and Security
Protects private data while shrinking big computer learning sets.
SA-ADP: Sensitivity-Aware Adaptive Differential Privacy for Large Language Models
Machine Learning (CS)
Protects private info without hurting computer smarts.