{ "id": "1902.00340", "version": "v1", "published": "2019-02-01T14:11:20.000Z", "updated": "2019-02-01T14:11:20.000Z", "title": "Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication", "authors": [ "Anastasia Koloskova", "Sebastian U. Stich", "Martin Jaggi" ], "categories": [ "cs.LG", "cs.DC", "cs.DS", "math.OC", "stat.ML" ], "abstract": "We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. To reduce the communication bottleneck, the nodes compress (e.g. quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by $\\omega \\leq 1$ ($\\omega=1$ meaning no compression). We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\\mathcal{O}\\left(1/(nT) + 1/(T \\delta^2 \\omega)^2\\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\\delta$ the eigengap of the connectivity matrix. Despite compression quality and network connectivity affecting the higher order terms, the first term in the rate, $\\mathcal{O}(1/(nT))$, is the same as for the centralized baseline with exact communication. We (ii) present a novel gossip algorithm, CHOCO-GOSSIP, for the average consensus problem that converges in time $\\mathcal{O}(1/(\\delta^2\\omega) \\log (1/\\epsilon))$ for accuracy $\\epsilon > 0$. This is (up to our knowledge) the first gossip algorithm that supports arbitrary compressed messages for $\\omega > 0$ and still exhibits linear convergence. We (iii) show in experiments that both of our algorithms do outperform the respective state-of-the-art baselines and CHOCO-SGD can reduce communication by at least two orders of magnitudes.", "revisions": [ { "version": "v1", "updated": "2019-02-01T14:11:20.000Z" } ], "analyses": { "subjects": [ "68W10", "68W15", "68W40", "90C06", "90C25", "90C35", "G.1.6", "F.2.1", "E.4" ], "keywords": [ "decentralized stochastic optimization", "gossip algorithm", "compressed communication", "novel gossip-based stochastic gradient descent", "gossip-based stochastic gradient descent algorithm" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }