Optimizer Dynamics at the Edge of Stability with Differential Privacy
By: Ayana Hussain, Ricky Fang
Deep learning models can reveal sensitive information about individual training examples, and while differential privacy (DP) provides guarantees restricting such leakage, it also alters optimization dynamics in poorly understood ways. We study the training dynamics of neural networks under DP by comparing Gradient Descent (GD), and Adam to their privacy-preserving variants. Prior work shows that these optimizers exhibit distinct stability dynamics: full-batch methods train at the Edge of Stability (EoS), while mini-batch and adaptive methods exhibit analogous edge-of-stability behavior. At these regimes, the training loss and the sharpness--the maximum eigenvalue of the training loss Hessian--exhibit certain characteristic behavior. In DP training, per-example gradient clipping and Gaussian noise modify the update rule, and it is unclear whether these stability patterns persist. We analyze how clipping and noise change sharpness and loss evolution and show that while DP generally reduces the sharpness and can prevent optimizers from fully reaching the classical stability thresholds, patterns from EoS and analogous adaptive methods stability regimes persist, with the largest learning rates and largest privacy budgets approaching, and sometimes exceeding, these thresholds. These findings highlight the unpredictability introduced by DP in neural network optimization.
Similar Papers
Towards hyperparameter-free optimization with differential privacy
Machine Learning (CS)
Trains AI privately without needing to test many settings.
DP-MicroAdam: Private and Frugal Algorithm for Training and Fine-tuning
Machine Learning (CS)
Makes private AI training faster and better.
Towards Understanding Generalization in DP-GD: A Case Study in Training Two-Layer CNNs
Machine Learning (Stat)
Keeps private data safe while computers learn.