{ "id": "2502.00885", "version": "v1", "published": "2025-02-02T19:25:48.000Z", "updated": "2025-02-02T19:25:48.000Z", "title": "Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise", "authors": [ "Thanh Dang", "Melih Barsbey", "A K M Rokonuzzaman Sonet", "Mert Gurbuzbalaban", "Umut Simsekli", "Lingjiong Zhu" ], "comment": "64 pages, 2 figures", "categories": [ "stat.ML", "cs.LG", "math.OC", "math.PR" ], "abstract": "Understanding the generalization properties of optimization algorithms under heavy-tailed noise has gained growing attention. However, the existing theoretical results mainly focus on stochastic gradient descent (SGD) and the analysis of heavy-tailed optimizers beyond SGD is still missing. In this work, we establish generalization bounds for SGD with momentum (SGDm) under heavy-tailed gradient noise. We first consider the continuous-time limit of SGDm, i.e., a Levy-driven stochastic differential equation (SDE), and establish quantitative Wasserstein algorithmic stability bounds for a class of potentially non-convex loss functions. Our bounds reveal a remarkable observation: For quadratic loss functions, we show that SGDm admits a worse generalization bound in the presence of heavy-tailed noise, indicating that the interaction of momentum and heavy tails can be harmful for generalization. We then extend our analysis to discrete-time and develop a uniform-in-time discretization error bound, which, to our knowledge, is the first result of its kind for SDEs with degenerate noise. This result shows that, with appropriately chosen step-sizes, the discrete dynamics retain the generalization properties of the limiting SDE. We illustrate our theory on both synthetic quadratic problems and neural networks.", "revisions": [ { "version": "v1", "updated": "2025-02-02T19:25:48.000Z" } ], "analyses": { "keywords": [ "stochastic gradient descent", "heavy-tailed noise", "levy-driven stochastic differential equation", "quantitative wasserstein algorithmic stability bounds", "uniform-in-time discretization error bound" ], "note": { "typesetting": "TeX", "pages": 64, "language": "en", "license": "arXiv", "status": "editable" } } }