arXiv Analytics

Sign in

arXiv:1705.10412 [math.OC]AbstractReferencesReviewsResources

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnabas Poczos, Aarti Singh

Published 2017-05-29Version 1

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points - it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

Related articles: Most relevant | Search more
arXiv:2211.00866 [math.OC] (Published 2022-11-02)
Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points
arXiv:2502.03701 [math.OC] (Published 2025-02-06)
First-ish Order Methods: Hessian-aware Scalings of Gradient Descent
arXiv:2102.02396 [math.OC] (Published 2021-02-04)
Exact Linear Convergence Rate Analysis for Low-Rank Symmetric Matrix Completion via Gradient Descent