arXiv:1705.10412 Abstract | arXiv Analytics

arXiv:1705.10412 [math.OC]Abstract References Reviews Resources

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnabas Poczos, Aarti Singh

Published 2017-05-29Version 1

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points - it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

Categories: math.OC, cs.LG, stat.ML

Keywords: escape saddle points, gradient descent, exponential time, fairly natural random initialization schemes, escapes saddle points

Related articles: Most relevant | Search more

arXiv:2211.00866 [math.OC] (Published 2022-11-02)

Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points

Rachael Tappenden, Martin Takáč

arXiv:2502.03701 [math.OC] (Published 2025-02-06)

First-ish Order Methods: Hessian-aware Scalings of Gradient Descent

Oscar Smee, Fred Roosta, Stephen J. Wright

arXiv:2102.02396 [math.OC] (Published 2021-02-04)

Exact Linear Convergence Rate Analysis for Low-Rank Symmetric Matrix Completion via Gradient Descent

Trung Vu, Raviv Raich