arXiv:1810.06825 Abstract | arXiv Analytics

arXiv:1810.06825 [cs.LG]Abstract References Reviews Resources

Fast Randomized PCA for Sparse Data

Xu Feng, Yuyang Xie, Mingye Song, Wenjian Yu, Jie Tang

Published 2018-10-16Version 1

Principal component analysis (PCA) is widely used for dimension reduction and embedding of real data in social network analysis, information retrieval, and natural language processing, etc. In this work we propose a fast randomized PCA algorithm for processing large sparse data. The algorithm has similar accuracy to the basic randomized SVD (rPCA) algorithm (Halko et al., 2011), but is largely optimized for sparse data. It also has good flexibility to trade off runtime against accuracy for practical usage. Experiments on real data show that the proposed algorithm is up to 9.1X faster than the basic rPCA algorithm without accuracy loss, and is up to 20X faster than the svds in Matlab with little error. The algorithm computes the first 100 principal components of a large information retrieval data with 12,869,521 persons and 323,899 keywords in less than 400 seconds on a 24-core machine, while all conventional methods fail due to the out-of-memory issue.

Comments: 16 pages, ACML2018 Accepted

Categories: cs.LG, stat.ML

Keywords: real data, large information retrieval data, social network analysis, fast randomized pca algorithm, conventional methods fail

Related articles: Most relevant | Search more

arXiv:1906.02640 [cs.LG] (Published 2019-06-06)

Near Neighbor: Who is the Fairest of Them All?

Sariel Har-Peled, Sepideh Mahabadi

arXiv:2410.18164 [cs.LG] (Published 2024-10-23)

TabDPT: Scaling Tabular Foundation Models

Junwei Ma et al.

arXiv:1902.03701 [cs.LG] (Published 2019-02-11)

Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight

Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, Sergey Levine

arXiv Analytics

arXiv:1810.06825 [cs.LG]Abstract References Reviews Resources

Fast Randomized PCA for Sparse Data

Links

Toolbox

arXiv:1810.06825 [cs.LG]AbstractReferencesReviewsResources

Fast Randomized PCA for Sparse Data

Links

Toolbox

arXiv:1810.06825 [cs.LG]Abstract References Reviews Resources