arXiv:2104.14840 Abstract | arXiv Analytics

arXiv:2104.14840 [math.OC]Abstract References Reviews Resources

On Stochastic Moving-Average Estimators for Non-Convex Optimization

Zhishuai Guo, Yi Xu, Wotao Yin, Rong Jin, Tianbao Yang

Published 2021-04-30Version 1

In this paper, we demonstrate the power of a widely used stochastic estimator based on moving average (SEMA) on a range of stochastic non-convex optimization problems, which only requires {\bf a general unbiased stochastic oracle}. We analyze various stochastic methods (existing or newly proposed) based on the {\bf variance recursion property} of SEMA for three families of non-convex optimization, namely standard stochastic non-convex minimization, stochastic non-convex strongly-concave min-max optimization, and stochastic bilevel optimization. Our contributions include: (i) for standard stochastic non-convex minimization, we present a simple and intuitive proof of convergence for a family Adam-style methods (including Adam) with an increasing or large "momentum" parameter for the first-order moment, which gives an alternative yet more natural way to guarantee Adam converge; (ii) for stochastic non-convex strongly-concave min-max optimization, we present a single-loop stochastic gradient descent ascent method based on the moving average estimators and establish its oracle complexity of $O(1/\epsilon^4)$ without using a large mini-batch size, addressing a gap in the literature; (iii) for stochastic bilevel optimization, we present a single-loop stochastic method based on the moving average estimators and establish its oracle complexity of $\widetilde O(1/\epsilon^4)$ without computing the inverse or SVD of the Hessian matrix, improving state-of-the-art results. For all these problems, we also establish a variance diminishing result for the used stochastic gradient estimators.

Categories: math.OC, cs.LG

Keywords: non-convex optimization, stochastic moving-average estimators, stochastic non-convex strongly-concave min-max optimization, stochastic gradient descent ascent, gradient descent ascent method

Related articles: Most relevant | Search more

arXiv:1511.08062 [math.OC] (Published 2015-11-25)

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization

Chen Xu, Zhouchen Lin, Zhenyu Zhao, Hongbin Zha

arXiv:2001.08356 [math.OC] (Published 2020-01-23)

Replica Exchange for Non-Convex Optimization

Jing Dong, Xin T. Tong

arXiv:1606.09070 [math.OC] (Published 2016-06-29)

Local Convergence of the Heavy-ball Method and iPiano for Non-convex Optimization

Peter Ochs

arXiv Analytics

arXiv:2104.14840 [math.OC]Abstract References Reviews Resources

On Stochastic Moving-Average Estimators for Non-Convex Optimization

Links

Toolbox

arXiv:2104.14840 [math.OC]AbstractReferencesReviewsResources

On Stochastic Moving-Average Estimators for Non-Convex Optimization

Links

Toolbox

arXiv:2104.14840 [math.OC]Abstract References Reviews Resources