arXiv:2302.08783 [cs.LG]AbstractReferencesReviewsResources
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance
Published 2023-02-17Version 1
We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general ``affine variance'' noise model and provides sharp rates of convergence in both the low-noise and high-noise~regimes.
Comments: 25 pages
Related articles: Most relevant | Search more
arXiv:2106.05061 [cs.LG] (Published 2021-06-09)
Quickest change detection with unknown parameters: Constant complexity and near optimality
arXiv:1301.4917 [cs.LG] (Published 2013-01-21)
Dirichlet draws are sparse with high probability
arXiv:2212.04914 [cs.LG] (Published 2022-12-09)
Information-Theoretic Safe Exploration with Gaussian Processes