arXiv:2401.15610 Abstract | arXiv Analytics

arXiv:2401.15610 [cs.LG]Abstract References Reviews Resources

Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data

Angus Dempster, Geoffrey I. Webb, Daniel F. Schmidt

Published 2024-01-28Version 1

Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in terms of classification error and log-loss, particularly for high-dimensional data, while being significantly more computationally efficient and having effectively no hyperparameters beyond regularisation. We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions derived from the estimated leave-one-out cross-validation error. This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.

Comments: 13 pages, 11 figures

Categories: cs.LG, stat.ML

Keywords: high-dimensional data, highly-efficient drop-in replacement, nominal additional computational expense, estimated leave-one-out cross-validation error, closely matches logistic regression

Related articles: Most relevant | Search more

arXiv:1808.05110 [cs.LG] (Published 2018-08-15)

Joint & Progressive Learning from High-Dimensional Data for Multi-Label Classification

Danfeng Hong, Naoto Yokoya, Jian Xu, Xiaoxiang Zhu

arXiv:1710.03113 [cs.LG] (Published 2017-10-09)

From Subspaces to Metrics and Beyond: Toward Multi-Diversified Ensemble Clustering of High-Dimensional Data

Dong Huang, Chang-Dong Wang, Jian-Huang Lai, Chee-Keong Kwoh

arXiv:2108.08706 [cs.LG] (Published 2021-07-28)

Attribute-based Explanations of Non-Linear Embeddings of High-Dimensional Data