arXiv:2402.03982 Abstract | arXiv Analytics

arXiv:2402.03982 [math.OC]Abstract References Reviews Resources

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

Published 2024-02-06Version 1

The Adaptive Momentum Estimation (Adam) algorithm is highly effective in training various deep learning tasks. Despite this, there's limited theoretical understanding for Adam, especially when focusing on its vanilla form in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. In this paper, we study vanilla Adam under these challenging conditions. We introduce a comprehensive noise model which governs affine variance noise, bounded noise and sub-Gaussian noise. We show that Adam can find a stationary point with a $\mathcal{O}(\text{poly}(\log T)/\sqrt{T})$ rate in high probability under this general noise model where $T$ denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors. More importantly, we reveal that Adam is free of tuning step-sizes with any problem-parameters, yielding a better adaptation property than the Stochastic Gradient Descent under the same conditions. We also provide a probabilistic convergence result for Adam under a generalized smooth condition which allows unbounded smoothness parameters and has been illustrated empirically to more accurately capture the smooth property of many practical objective functions.

Categories: math.OC, cs.LG, stat.ML

Keywords: stochastic optimization, relaxed assumptions, governs affine variance noise, denotes total number iterations, stochastic first-order algorithms

Related articles: Most relevant | Search more

arXiv:2306.04174 [math.OC] (Published 2023-06-07)

End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

Yves Rychener, Daniel Kuhn Tobias Sutter

arXiv:1711.05762 [math.OC] (Published 2017-11-15)

Random gradient extrapolation for distributed and stochastic optimization

Guanghui Lan, Yi Zhou

arXiv:2209.09162 [math.OC] (Published 2022-09-19)

On the Theoretical Properties of Noise Correlation in Stochastic Optimization

Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans Kersting