{ "id": "2204.09086", "version": "v1", "published": "2022-04-19T18:43:00.000Z", "updated": "2022-04-19T18:43:00.000Z", "title": "Choosing the number of factors in factor analysis with incomplete data via a hierarchical Bayesian information criterion", "authors": [ "Jianhua Zhao", "Changchun Shang", "Shulan Li", "Ling Xin", "Philip L. H. Yu" ], "comment": "16 pages, 4 figures", "categories": [ "stat.ML", "cs.LG" ], "abstract": "The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size $N$, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size $N$ is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only $N_i