arXiv Analytics

Sign in

arXiv:1707.00727 [stat.ML]AbstractReferencesReviewsResources

Regression Phalanxes

Hongyang Zhang, William J. Welch, Ruben H. Zamar

Published 2017-07-03Version 1

Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensional data sets using hierarchical clustering and builds a prediction model for each phalanx for further ensembling. Through extensive simulation studies and several real-life applications in various areas (including drug discovery, chemical analysis of spectra data, microarray analysis and climate projections) we show that an ensemble of Regression Phalanxes improves prediction accuracy when combined with effective prediction methods like Lasso or Random Forests.

Related articles: Most relevant | Search more
arXiv:2411.19908 [stat.ML] (Published 2024-11-29)
Another look at inference after prediction
arXiv:1702.03244 [stat.ML] (Published 2017-02-10)
$L_2$Boosting for Economic Applications
arXiv:2505.00310 [stat.ML] (Published 2025-05-01, updated 2025-06-18)
Statistical Learning for Heterogeneous Treatment Effects: Pretraining, Prognosis, and Prediction