arXiv:2409.09745 Abstract | arXiv Analytics

arXiv:2409.09745 [cs.LG]Abstract References Reviews Resources

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization

Haihan Zhang, Yuanshi Liu, Qianwen Chen, Cong Fang

Published 2024-09-15Version 1

Stochastic gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training. Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization under suitable high-dimensional settings. However, a fundamental question -- for what kinds of high-dimensional learning problems SGD and its accelerated variants can achieve optimality has yet to be well studied. This paper investigates SGD with two essential components in practice: exponentially decaying step size schedule and momentum. We establish the convergence upper bound for momentum accelerated SGD (ASGD) and propose concrete classes of learning problems under which SGD or ASGD achieves min-max optimal convergence rates. The characterization of the target function is based on standard power-law decays in (functional) linear regression. Our results unveil new insights for understanding the learning bias of SGD: (i) SGD is efficient in learning ``dense'' features where the corresponding weights are subject to an infinity norm constraint; (ii) SGD is efficient for easy problem without suffering from the saturation effect; (iii) momentum can accelerate the convergence rate by order when the learning problem is relatively hard. To our knowledge, this is the first work to clearly identify the optimal boundary of SGD versus ASGD for the problem under mild settings.

Comments: 46 pages

Categories: cs.LG, math.OC

Keywords: high-dimensional quadratic optimization, decaying step size schedule, optimality, achieves min-max optimal convergence rates, learning problem

Related articles: Most relevant | Search more

arXiv:1907.05444 [cs.LG] (Published 2019-07-11)

On the Optimality of Trees Generated by ID3

Alon Brutzkus, Amit Daniely, Eran Malach

arXiv:2004.08083 [cs.LG] (Published 2020-04-17)

Meta-Meta-Classification for One-Shot Learning

Arkabandhu Chowdhury, Dipak Chaudhari, Swarat Chaudhuri, Chris Jermaine

arXiv:1810.09418 [cs.LG] (Published 2018-10-22)

Optimality of the final model found via Stochastic Gradient Descent

Andrea Schioppa

arXiv Analytics

arXiv:2409.09745 [cs.LG]Abstract References Reviews Resources

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization

Links

Toolbox

arXiv:2409.09745 [cs.LG]AbstractReferencesReviewsResources

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization

Links

Toolbox

arXiv:2409.09745 [cs.LG]Abstract References Reviews Resources