arXiv:2209.12309 Abstract | arXiv Analytics

arXiv:2209.12309 [cs.LG]Abstract References Reviews Resources

Feature Encodings for Gradient Boosting with Automunge

Published 2022-09-25Version 1

Selecting a default feature encoding strategy for gradient boosted learning may consider metrics of training duration and achieved predictive performance associated with the feature representations. The Automunge library for dataframe preprocessing offers a default of binarization for categoric features and z-score normalization for numeric. The presented study sought to validate those defaults by way of benchmarking on a series of diverse data sets by encoding variations with tuned gradient boosted learning. We found that on average our chosen defaults were top performers both from a tuning duration and a model performance standpoint. Another key finding was that one hot encoding did not perform in a manner consistent with suitability to serve as a categoric default in comparison to categoric binarization. We present here these and further benchmarks.

Comments: 10 pages, 4 figures, preprint

Categories: cs.LG

Keywords: gradient boosting, default feature encoding strategy, model performance standpoint, diverse data sets, gradient boosted learning

Related articles: Most relevant | Search more

arXiv:2204.06895 [cs.LG] (Published 2022-04-14)

Gradient boosting for convex cone predict and optimize problems

Andrew Butler, Roy H. Kwon

arXiv:1909.12098 [cs.LG] (Published 2019-09-26)

Sequential Training of Neural Networks with Gradient Boosting

Gonzalo Martínez-Muñoz

arXiv:1802.05640 [cs.LG] (Published 2018-02-15)

Gradient Boosting With Piece-Wise Linear Regression Trees