arXiv Analytics

Sign in

arXiv:2206.00529 [cs.LG]AbstractReferencesReviewsResources

Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top

Eduard Gorbunov, Samuel Horváth, Peter Richtárik, Gauthier Gidel

Published 2022-06-01Version 1

Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in collaborative and federated learning. However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field. This work addresses this gap and proposes Byz-VR-MARINA - a new Byzantine-tolerant method with variance reduction and compression. A key message of our paper is that variance reduction is key to fighting Byzantine workers more effectively. At the same time, communication compression is a bonus that makes the process more communication efficient. We derive theoretical convergence guarantees for Byz-VR-MARINA outperforming previous state-of-the-art for general non-convex and Polyak-Lojasiewicz loss functions. Unlike the concurrent Byzantine-robust methods with variance reduction and/or compression, our complexity results are tight and do not rely on restrictive assumptions such as boundedness of the gradients or limited compression. Moreover, we provide the first analysis of a Byzantine-tolerant method supporting non-uniform sampling of stochastic gradients. Numerical experiments corroborate our theoretical findings.

Related articles: Most relevant | Search more
arXiv:1909.05350 [cs.LG] (Published 2019-09-11)
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
arXiv:2002.08958 [cs.LG] (Published 2020-02-20)
Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor
arXiv:2206.03665 [cs.LG] (Published 2022-06-08)
Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression