arXiv:1709.01716 Abstract | arXiv Analytics

arXiv:1709.01716 [stat.ML]Abstract References Reviews Resources

Optimal Sub-sampling with Influence Functions

Published 2017-09-06Version 1

Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the concept of an asymptotically linear estimator and the associated influence function leads to optimal sampling procedures for a wide class of popular models. Furthermore, for linear regression models which have well-studied procedures for non-uniform sub-sampling, we show our optimal influence function based method outperforms previous approaches. We empirically show the improved performance of our method on real datasets.

Categories: stat.ML, cs.LG

Keywords: optimal sub-sampling, optimal influence function, linear regression models, real datasets, large datasets

Related articles: Most relevant | Search more

arXiv:2012.01668 [stat.ML] (Published 2020-12-03)

Online Forgetting Process for Linear Regression Models

Yuantong Li, Chi-hua Wang, Guang Cheng

arXiv:1901.00630 [stat.ML] (Published 2019-01-03)

Projecting "better than randomly": How to reduce the dimensionality of very large datasets in a way that outperforms random projections

Michael Wojnowicz, Di Zhang, Glenn Chisholm, Xuan Zhao, Matt Wolff

arXiv:1901.09881 [stat.ML] (Published 2019-01-28)

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets

Robert Cornish, Paul Vanetti, Alexandre Bouchard-Côté, George Deligiannidis, Arnaud Doucet