arXiv Analytics

Sign in

arXiv:1511.03688 [stat.ML]AbstractReferencesReviewsResources

Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

Hervé Cardot, David Degras

Published 2015-11-11Version 1

In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to routinely perform tasks like principal component analysis (PCA). Recursive algorithms that update the PCA with each new observation have been studied in various fields of research and found wide applications in industrial monitoring, computer vision, astronomy, and latent semantic indexing, among others. This work provides guidance for selecting an online PCA algorithm in practice. We present the main approaches to online PCA, namely, perturbation techniques, incremental methods, and stochastic optimization, and compare their statistical accuracy, computation time, and memory requirements using artificial and real data. Extensions to missing data and to functional data are discussed. All studied algorithms are available in the R package onlinePCA on CRAN.

Related articles: Most relevant | Search more
arXiv:2112.14233 [stat.ML] (Published 2021-12-28, updated 2022-02-15)
Learning Across Bandits in High Dimension via Robust Statistics
arXiv:2403.15038 [stat.ML] (Published 2024-03-22)
Estimation of multiple mean vectors in high dimension
arXiv:2410.09973 [stat.ML] (Published 2024-10-13)
Gradient Span Algorithms Make Predictable Progress in High Dimension