arXiv Analytics

Sign in

arXiv:2002.05273 [stat.ML]AbstractReferencesReviewsResources

Exponential Step Sizes for Non-Convex Optimization

Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona

Published 2020-02-12Version 1

Stochastic Gradient Descent (SGD) is a popular tool in large scale optimization of machine learning objective functions. However, the performance is greatly variable, depending on the choice of the step sizes. In this paper, we introduce the exponential step sizes for stochastic optimization of smooth non-convex functions which satisfy the Polyak-\L{}ojasiewicz (PL) condition. We show that, without any information on the level of noise over the stochastic gradients, these step sizes guarantee a convergence rate for the last iterate that automatically interpolates between a linear rate (in the noisy-free case) and a $O(\frac{1}{T})$ rate (in the noisy case), up to poly-logarithmic factors. Moreover, if without the PL condition, the exponential step sizes still guarantee optimal convergence to a critical point, up to logarithmic factors. We also validate our theoretical results with empirical experiments on real-world datasets with deep learning architectures.

Related articles: Most relevant | Search more
arXiv:1805.08114 [stat.ML] (Published 2018-05-21)
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
arXiv:1710.06382 [stat.ML] (Published 2017-10-17)
Convergence diagnostics for stochastic gradient descent with constant step size
arXiv:2408.02839 [stat.ML] (Published 2024-08-05)
Optimizing Cox Models with Stochastic Gradient Descent: Theoretical Foundations and Practical Guidances