{
  "id": "1902.00340",
  "version": "v1",
  "published": "2019-02-01T14:11:20.000Z",
  "updated": "2019-02-01T14:11:20.000Z",
  "title": "Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication",
  "authors": [
    "Anastasia Koloskova",
    "Sebastian U. Stich",
    "Martin Jaggi"
  ],
  "categories": [
    "cs.LG",
    "cs.DC",
    "cs.DS",
    "math.OC",
    "stat.ML"
  ],
  "abstract": "We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. To reduce the communication bottleneck, the nodes compress (e.g. quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by $\\omega \\leq 1$ ($\\omega=1$ meaning no compression). We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\\mathcal{O}\\left(1/(nT) + 1/(T \\delta^2 \\omega)^2\\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\\delta$ the eigengap of the connectivity matrix. Despite compression quality and network connectivity affecting the higher order terms, the first term in the rate, $\\mathcal{O}(1/(nT))$, is the same as for the centralized baseline with exact communication. We (ii) present a novel gossip algorithm, CHOCO-GOSSIP, for the average consensus problem that converges in time $\\mathcal{O}(1/(\\delta^2\\omega) \\log (1/\\epsilon))$ for accuracy $\\epsilon > 0$. This is (up to our knowledge) the first gossip algorithm that supports arbitrary compressed messages for $\\omega > 0$ and still exhibits linear convergence. We (iii) show in experiments that both of our algorithms do outperform the respective state-of-the-art baselines and CHOCO-SGD can reduce communication by at least two orders of magnitudes.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2019-02-01T14:11:20.000Z"
    }
  ],
  "analyses": {
    "subjects": [
      "68W10",
      "68W15",
      "68W40",
      "90C06",
      "90C25",
      "90C35",
      "G.1.6",
      "F.2.1",
      "E.4"
    ],
    "keywords": [
      "decentralized stochastic optimization",
      "gossip algorithm",
      "compressed communication",
      "novel gossip-based stochastic gradient descent",
      "gossip-based stochastic gradient descent algorithm"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}