arXiv:2004.14637 Abstract | arXiv Analytics

arXiv:2004.14637 [stat.ML]Abstract References Reviews Resources

Generalization Error for Linear Regression under Distributed Learning

Martin Hellkvist, Ayça Özçelikkale, Anders Ahlén

Published 2020-04-30Version 1

Distributed learning facilitates the scaling-up of data processing by distributing the computational burden over several nodes. Despite the vast interest in distributed learning, generalization performance of such approaches is not well understood. We address this gap by focusing on a linear regression setting. We consider the setting where the unknowns are distributed over a network of nodes. We present an analytical characterization of the dependence of the generalization error on the partitioning of the unknowns over nodes. In particular, for the overparameterized case, our results show that while the error on training data remains in the same range as that of the centralized solution, the generalization error of the distributed solution increases dramatically compared to that of the centralized solution when the number of unknowns estimated at any node is close to the number of observations. We further provide numerical examples to verify our analytical expressions.

Categories: stat.ML, cs.LG, eess.SP

Keywords: generalization error, linear regression, distributed learning, centralized solution, vast interest

Related articles: Most relevant | Search more

arXiv:2410.14183 [stat.ML] (Published 2024-10-18)

Provable In-context Learning for Mixture of Linear Regressions using Transformers

Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai

arXiv:2409.18836 [stat.ML] (Published 2024-09-27)

Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark Study

Hannah Schulz-Kümpel, Sebastian Fischer, Thomas Nagler, Anne-Laure Boulesteix, Bernd Bischl, Roman Hornung

arXiv:2409.09078 [stat.ML] (Published 2024-09-10)

Bounds on the Generalization Error in Active Learning

Vincent Menden, Yahya Saleh, Armin Iske