arXiv:1205.1053 Abstract | arXiv Analytics

arXiv:1205.1053 [cs.LG]Abstract References Reviews Resources

Variable Selection for Latent Dirichlet Allocation

Published 2012-05-04Version 1

In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method widely used in statistical modeling as a dimension reduction tool and combine it with LDA. In this variable selection model for LDA (vsLDA), topics are multinomial distributions over a subset of the vocabulary, and by excluding words that are not informative for finding the latent topic structure of the corpus, vsLDA finds topics that are more robust and discriminative. We compare three models, vsLDA, LDA with symmetric priors, and LDA with asymmetric priors, on heldout likelihood, MCMC chain consistency, and document classification. The performance of vsLDA is better than symmetric LDA for likelihood and classification, better than asymmetric LDA for consistency and classification, and about the same in the other comparisons.

Categories: cs.LG, stat.ML

Keywords: latent dirichlet allocation, variable selection, multinomial distributions, vslda finds topics, classification

Related articles: Most relevant | Search more

arXiv:1402.2300 [cs.LG] (Published 2014-02-10)

Feature and Variable Selection in Classification

Aaron Karper

arXiv:1708.08591 [cs.LG] (Published 2017-08-29)

EC3: Combining Clustering and Classification for Ensemble Learning

Tanmoy Chakraborty

arXiv:1703.08816 [cs.LG] (Published 2017-03-26)

Uncertainty Quantification in the Classification of High Dimensional Data

Andrea L. Bertozzi, Xiyang Luo, Andrew M. Stuart, Konstantinos C. Zygalakis

arXiv Analytics

arXiv:1205.1053 [cs.LG]Abstract References Reviews Resources

Variable Selection for Latent Dirichlet Allocation

Links

Toolbox

arXiv:1205.1053 [cs.LG]AbstractReferencesReviewsResources

Variable Selection for Latent Dirichlet Allocation

Links

Toolbox

arXiv:1205.1053 [cs.LG]Abstract References Reviews Resources