{
  "id": "2104.14840",
  "version": "v1",
  "published": "2021-04-30T08:50:24.000Z",
  "updated": "2021-04-30T08:50:24.000Z",
  "title": "On Stochastic Moving-Average Estimators for Non-Convex Optimization",
  "authors": [
    "Zhishuai Guo",
    "Yi Xu",
    "Wotao Yin",
    "Rong Jin",
    "Tianbao Yang"
  ],
  "categories": [
    "math.OC",
    "cs.LG"
  ],
  "abstract": "In this paper, we demonstrate the power of a widely used stochastic estimator based on moving average (SEMA) on a range of stochastic non-convex optimization problems, which only requires {\\bf a general unbiased stochastic oracle}. We analyze various stochastic methods (existing or newly proposed) based on the {\\bf variance recursion property} of SEMA for three families of non-convex optimization, namely standard stochastic non-convex minimization, stochastic non-convex strongly-concave min-max optimization, and stochastic bilevel optimization. Our contributions include: (i) for standard stochastic non-convex minimization, we present a simple and intuitive proof of convergence for a family Adam-style methods (including Adam) with an increasing or large \"momentum\" parameter for the first-order moment, which gives an alternative yet more natural way to guarantee Adam converge; (ii) for stochastic non-convex strongly-concave min-max optimization, we present a single-loop stochastic gradient descent ascent method based on the moving average estimators and establish its oracle complexity of $O(1/\\epsilon^4)$ without using a large mini-batch size, addressing a gap in the literature; (iii) for stochastic bilevel optimization, we present a single-loop stochastic method based on the moving average estimators and establish its oracle complexity of $\\widetilde O(1/\\epsilon^4)$ without computing the inverse or SVD of the Hessian matrix, improving state-of-the-art results. For all these problems, we also establish a variance diminishing result for the used stochastic gradient estimators.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2021-04-30T08:50:24.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "non-convex optimization",
      "stochastic moving-average estimators",
      "stochastic non-convex strongly-concave min-max optimization",
      "stochastic gradient descent ascent",
      "gradient descent ascent method"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}