arXiv Analytics

Sign in

arXiv:2106.07644 [math.OC]AbstractReferencesReviewsResources

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

Published 2021-06-10Version 1

We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; and a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms.

Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035
Categories: math.OC, cs.LG, cs.MA, math.PR, stat.ML
Related articles: Most relevant | Search more
arXiv:2406.09241 [math.OC] (Published 2024-06-13)
What is the long-run distribution of stochastic gradient descent? A large deviations analysis
arXiv:2006.11144 [math.OC] (Published 2020-06-19)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
arXiv:2110.11442 [math.OC] (Published 2021-10-21, updated 2022-01-30)
Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent