arXiv:2502.00885 Abstract | arXiv Analytics

arXiv:2502.00885 [stat.ML]Abstract References Reviews Resources

Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise

Thanh Dang, Melih Barsbey, A K M Rokonuzzaman Sonet, Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Published 2025-02-02Version 1

Understanding the generalization properties of optimization algorithms under heavy-tailed noise has gained growing attention. However, the existing theoretical results mainly focus on stochastic gradient descent (SGD) and the analysis of heavy-tailed optimizers beyond SGD is still missing. In this work, we establish generalization bounds for SGD with momentum (SGDm) under heavy-tailed gradient noise. We first consider the continuous-time limit of SGDm, i.e., a Levy-driven stochastic differential equation (SDE), and establish quantitative Wasserstein algorithmic stability bounds for a class of potentially non-convex loss functions. Our bounds reveal a remarkable observation: For quadratic loss functions, we show that SGDm admits a worse generalization bound in the presence of heavy-tailed noise, indicating that the interaction of momentum and heavy tails can be harmful for generalization. We then extend our analysis to discrete-time and develop a uniform-in-time discretization error bound, which, to our knowledge, is the first result of its kind for SDEs with degenerate noise. This result shows that, with appropriately chosen step-sizes, the discrete dynamics retain the generalization properties of the limiting SDE. We illustrate our theory on both synthetic quadratic problems and neural networks.

Comments: 64 pages, 2 figures

Categories: stat.ML, cs.LG, math.OC, math.PR

Keywords: stochastic gradient descent, heavy-tailed noise, levy-driven stochastic differential equation, quantitative wasserstein algorithmic stability bounds, uniform-in-time discretization error bound

Related articles: Most relevant | Search more

arXiv:2403.02051 [stat.ML] (Published 2024-03-04, updated 2025-05-12)

Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping

Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yıldırım, Lingjiong Zhu

arXiv:2403.13868 [stat.ML] (Published 2024-03-20)

Analysing heavy-tail properties of Stochastic Gradient Descent by means of Stochastic Recurrence Equations

Ewa Damek, Sebastian Mentemeier

arXiv:2502.06719 [stat.ML] (Published 2025-02-10)

Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent