Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation
By: Faruk Alpay, Hamdi Alakkad
Potential Business Impact:
Helps computers find best answers faster.
We present a comprehensive theoretical analysis of first-order methods for escaping strict saddle points in smooth non-convex optimization. Our main contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully explicit constants and a rigorous separation between gradient-descent and saddle-escape phases. For a function $f:\mathbb{R}^d\to\mathbb{R}$ with $\ell$-Lipschitz gradient and $\rho$-Lipschitz Hessian, we prove that PSD finds an $(\epsilon,\sqrt{\rho\epsilon})$-approximate second-order stationary point with high probability using at most $O(\ell\Delta_f/\epsilon^2)$ gradient evaluations for the descent phase plus $O((\ell/\sqrt{\rho\epsilon})\log(d/\delta))$ evaluations per escape episode, with at most $O(\ell\Delta_f/\epsilon^2)$ episodes needed. We validate our theoretical predictions through extensive experiments across both synthetic functions and practical machine learning tasks, confirming the logarithmic dimension dependence and the predicted per-episode function decrease. We also provide complete algorithmic specifications including a finite-difference variant (PSD-Probe) and a stochastic extension (PSGD) with robust mini-batch sizing.
Similar Papers
Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
Optimization and Control
Helps computers find the best answers faster.
Numerical analysis for computing multiple solutions of semilinear elliptic problems by high-index saddle dynamics: Part I--Index-1 case
Numerical Analysis
Finds math answers faster using new computer tricks.
Randomized coordinate gradient descent almost surely escapes strict saddle points
Optimization and Control
Escapes tricky math problems by avoiding dead ends.