arXiv:1509.01240 Abstract | arXiv Analytics

arXiv:1509.01240 [cs.LG]Abstract References Reviews Resources

Train faster, generalize better: Stability of stochastic gradient descent

Moritz Hardt, Benjamin Recht, Yoram Singer

Published 2015-09-03Version 1

We show that any model trained by a stochastic gradient method with few iterations has vanishing generalization error. We prove this by showing the method is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. Our results apply to both convex and non-convex optimization under standard Lipschitz and smoothness assumptions. Applying our results to the convex case, we provide new explanations for why multiple epochs of stochastic gradient descent generalize well in practice. In the nonconvex case, we provide a new interpretation of common practices in neural networks, and provide a formal rationale for stability-promoting mechanisms in training large, deep models. Conceptually, our findings underscore the importance of reducing training time beyond its obvious benefit.

Categories: cs.LG, math.OC, stat.ML

Keywords: stochastic gradient descent, train faster, generalize better, stochastic gradient method, employs elementary tools

Related articles: Most relevant | Search more

arXiv:1905.13210 [cs.LG] (Published 2019-05-30)

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Yuan Cao, Quanquan Gu

arXiv:1212.1824 [cs.LG] (Published 2012-12-08, updated 2012-12-28)

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Ohad Shamir, Tong Zhang

arXiv:1411.1134 [cs.LG] (Published 2014-11-05)

Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems

Christopher De Sa, Kunle Olukotun, Christopher Ré