arXiv Analytics

Sign in

arXiv:2209.12309 [cs.LG]AbstractReferencesReviewsResources

Feature Encodings for Gradient Boosting with Automunge

Nicholas J. Teague

Published 2022-09-25Version 1

Selecting a default feature encoding strategy for gradient boosted learning may consider metrics of training duration and achieved predictive performance associated with the feature representations. The Automunge library for dataframe preprocessing offers a default of binarization for categoric features and z-score normalization for numeric. The presented study sought to validate those defaults by way of benchmarking on a series of diverse data sets by encoding variations with tuned gradient boosted learning. We found that on average our chosen defaults were top performers both from a tuning duration and a model performance standpoint. Another key finding was that one hot encoding did not perform in a manner consistent with suitability to serve as a categoric default in comparison to categoric binarization. We present here these and further benchmarks.

Related articles: Most relevant | Search more
arXiv:2204.06895 [cs.LG] (Published 2022-04-14)
Gradient boosting for convex cone predict and optimize problems
arXiv:1909.12098 [cs.LG] (Published 2019-09-26)
Sequential Training of Neural Networks with Gradient Boosting
arXiv:1802.05640 [cs.LG] (Published 2018-02-15)
Gradient Boosting With Piece-Wise Linear Regression Trees