arXiv Analytics

Sign in

arXiv:2104.03158 [stat.ML]AbstractReferencesReviewsResources

Prediction with Missing Data

Dimitris Bertsimas, Arthur Delarue, Jean Pauphilet

Published 2021-04-07Version 1

Missing information is inevitable in real-world data sets. While imputation is well-suited and theoretically sound for statistical inference, its relevance and practical implementation for out-of-sample prediction remains unsettled. We provide a theoretical analysis of widely used data imputation methods and highlight their key deficiencies in making accurate predictions. Alternatively, we propose adaptive linear regression, a new class of models that can be directly trained and evaluated on partially observed data, adapting to the set of available features. In particular, we show that certain adaptive regression models are equivalent to impute-then-regress methods where the imputation and the regression models are learned simultaneously instead of sequentially. We validate our theoretical findings and adaptive regression approach with numerical results with real-world data sets.

Related articles: Most relevant | Search more
arXiv:2205.03820 [stat.ML] (Published 2022-05-08)
Some performance considerations when using multi-armed bandit algorithms in the presence of missing data
arXiv:2310.12806 [stat.ML] (Published 2023-10-19)
DCSI -- An improved measure of cluster separability based on separation and connectedness
arXiv:2110.12595 [stat.ML] (Published 2021-10-25, updated 2022-02-18)
Fast Rank-1 NMF for Missing Data with KL Divergence