arXiv:2006.11144 Abstract | arXiv Analytics

arXiv:2006.11144 [math.OC]Abstract References Reviews Resources

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Panayotis Mertikopoulos, Nadav Hallak, Ali Kavis, Volkan Cevher

Published 2020-06-19Version 1

This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is $\mathcal{O}(1/n^{p})$ if the method is employed with a $\Theta(1/n^p)$ step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR.

Comments: 32 pages, 8 figures

Categories: math.OC, cs.LG, math.PR, stat.ML

Subjects: 90C26, 62L20, 90C30, 90C15, 37N40

Keywords: stochastic gradient descent, non-convex problems, sure convergence, sgd avoids strict saddle points/manifolds, algorithms convergence properties

Related articles: Most relevant | Search more

arXiv:2406.09241 [math.OC] (Published 2024-06-13)

What is the long-run distribution of stochastic gradient descent? A large deviations analysis

Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

arXiv:2412.06070 [math.OC] (Published 2024-12-08)

Stochastic Gradient Descent Revisited

Azar Louzi

arXiv:1709.04718 [math.OC] (Published 2017-09-14)

The Impact of Local Geometry and Batch Size on the Convergence and Divergence of Stochastic Gradient Descent

Vivak Patel

arXiv Analytics

arXiv:2006.11144 [math.OC]Abstract References Reviews Resources

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Links

Toolbox

arXiv:2006.11144 [math.OC]AbstractReferencesReviewsResources

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Links

Toolbox

arXiv:2006.11144 [math.OC]Abstract References Reviews Resources