arXiv:2307.02071 Abstract | arXiv Analytics

arXiv:2307.02071 [cs.LG]Abstract References Reviews Resources

A Comparison of Machine Learning Methods for Data with High-Cardinality Categorical Variables

Published 2023-07-05Version 1

High-cardinality categorical variables are variables for which the number of different levels is large relative to the sample size of a data set, or in other words, there are few data points per level. Machine learning methods can have difficulties with high-cardinality variables. In this article, we empirically compare several versions of two of the most successful machine learning methods, tree-boosting and deep neural networks, and linear mixed effects models using multiple tabular data sets with high-cardinality categorical variables. We find that, first, machine learning models with random effects have higher prediction accuracy than their classical counterparts without random effects, and, second, tree-boosting with random effects outperforms deep neural networks with random effects.

Categories: cs.LG, cs.AI, stat.ML

Keywords: machine learning methods, high-cardinality categorical variables, random effects outperforms deep neural, effects outperforms deep neural networks

Related articles: Most relevant | Search more

arXiv:2009.09756 [cs.LG] (Published 2020-09-21)

Demand Prediction Using Machine Learning Methods and Stacked Generalization

Resul Tugay, Sule Gunduz Oguducu

arXiv:2209.04643 [cs.LG] (Published 2022-09-10)

Examining stability of machine learning methods for predicting dementia at early phases of the disease

Sinan Faouri, Mahmood AlBashayreh, Mohammad Azzeh

arXiv:1407.7417 [cs.LG] (Published 2014-07-28)

'Almost Sure' Chaotic Properties of Machine Learning Methods

Nabarun Mondal, Partha P. Ghosh