arXiv:2402.05680 Abstract | arXiv Analytics

arXiv:2402.05680 [cs.LG]Abstract References Reviews Resources

Interpretable classifiers for tabular data via discretization and feature selection

Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, Miikka Vilander

Published 2024-02-08Version 1

We introduce a method for computing immediately human interpretable yet accurate classifiers from tabular data. The classifiers obtained are short DNF-formulas, computed via first discretizing the original data to Boolean form and then using feature selection coupled with a very fast algorithm for producing the best possible Boolean classifier for the setting. We demonstrate the approach via 14 experiments, obtaining results with accuracies mainly similar to ones obtained via random forests, XGBoost, and existing results for the same datasets in the literature. In several cases, our approach in fact outperforms the reference results in relation to accuracy, even though the main objective of our study is the immediate interpretability of our classifiers. We also prove a new result on the probability that the classifier we obtain from real-life data corresponds to the ideally best classifier with respect to the background distribution the data comes from.

Categories: cs.LG, cs.AI, cs.LO

Subjects: I.2.6, F.4.1, I.2.4, E.2

Keywords: tabular data, feature selection, interpretable classifiers, discretization, real-life data corresponds

Related articles: Most relevant | Search more

arXiv:2001.09654 [cs.LG] (Published 2020-01-27)

Feature selection in machine learning: Rényi min-entropy vs Shannon entropy

Catuscia Palamidessi, Marco Romanelli

arXiv:1905.02845 [cs.LG] (Published 2019-05-07)

Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review

Benyamin Ghojogh, Maria N. Samad, Sayema Asif Mashhadi, Tania Kapoor, Wahab Ali, Fakhri Karray, Mark Crowley

arXiv:2101.05950 [cs.LG] (Published 2021-01-15)

Robusta: Robust AutoML for Feature Selection via Reinforcement Learning