arXiv Analytics

Sign in

arXiv:2108.13581 [cs.LG]AbstractReferencesReviewsResources

DoGR: Disaggregated Gaussian Regression for Reproducible Analysis of Heterogeneous Data

Nazanin Alipourfard, Keith Burghardt, Kristina Lerman

Published 2021-08-31Version 1

Quantitative analysis of large-scale data is often complicated by the presence of diverse subgroups, which reduce the accuracy of inferences they make on held-out data. To address the challenge of heterogeneous data analysis, we introduce DoGR, a method that discovers latent confounders by simultaneously partitioning the data into overlapping clusters (disaggregation) and modeling the behavior within them (regression). When applied to real-world data, our method discovers meaningful clusters and their characteristic behaviors, thus giving insight into group differences and their impact on the outcome of interest. By accounting for latent confounders, our framework facilitates exploratory analysis of noisy, heterogeneous data and can be used to learn predictive models that better generalize to new data. We provide the code to enable others to use DoGR within their data analytic workflows.

Related articles: Most relevant | Search more
arXiv:2206.04723 [cs.LG] (Published 2022-06-09)
On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data
arXiv:2303.02278 [cs.LG] (Published 2023-03-04, updated 2023-06-05)
Federated Virtual Learning on Heterogeneous Data with Local-global Distillation
arXiv:1906.01736 [cs.LG] (Published 2019-06-04)
Distributed Training with Heterogeneous Data: Bridging Median and Mean Based Algorithms