arXiv:2002.05273 Abstract | arXiv Analytics

arXiv:2002.05273 [stat.ML]Abstract References Reviews Resources

Exponential Step Sizes for Non-Convex Optimization

Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona

Published 2020-02-12Version 1

Stochastic Gradient Descent (SGD) is a popular tool in large scale optimization of machine learning objective functions. However, the performance is greatly variable, depending on the choice of the step sizes. In this paper, we introduce the exponential step sizes for stochastic optimization of smooth non-convex functions which satisfy the Polyak-\L{}ojasiewicz (PL) condition. We show that, without any information on the level of noise over the stochastic gradients, these step sizes guarantee a convergence rate for the last iterate that automatically interpolates between a linear rate (in the noisy-free case) and a $O(\frac{1}{T})$ rate (in the noisy case), up to poly-logarithmic factors. Moreover, if without the PL condition, the exponential step sizes still guarantee optimal convergence to a critical point, up to logarithmic factors. We also validate our theoretical results with empirical experiments on real-world datasets with deep learning architectures.

Categories: stat.ML, cs.LG

Keywords: exponential step sizes, non-convex optimization, large scale optimization, smooth non-convex functions, stochastic gradient descent

Related articles: Most relevant | Search more

arXiv:1805.08114 [stat.ML] (Published 2018-05-21)

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

Xiaoyu Li, Francesco Orabona

arXiv:1710.06382 [stat.ML] (Published 2017-10-17)

Convergence diagnostics for stochastic gradient descent with constant step size

Jerry Chee, Panos Toulis

arXiv:2408.02839 [stat.ML] (Published 2024-08-05)

Optimizing Cox Models with Stochastic Gradient Descent: Theoretical Foundations and Practical Guidances

Lang Zeng, Weijing Tang, Zhao Ren, Ying Ding