arXiv:2501.17889 Abstract | arXiv Analytics

arXiv:2501.17889 [stat.ML]Abstract References Reviews Resources

Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection

Xiaochen Zhang, Yunfeng Cai, Haoyi Xiong

Published 2025-01-28Version 1

Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable selection. Specifically, Knoop first generates multiple knockoff variables for each original variable and integrates them with the original variables into an over-parameterized Ridgeless regression model. For each original variable, Knoop evaluates the coefficient distribution of its knockoffs and compares these with the original coefficients to conduct an anomaly-based significance test, ensuring robust variable selection. Extensive experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets. Knoop achieves a notably higher Area under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve for effectively identifying relevant variables against the ground truth by controlled simulations, while showcasing enhanced predictive accuracy across diverse regression and classification tasks. The analytical results further backup our observations.

Comments: An earlier version of our paper at Machine Learning

Journal: Machine Learning, Volume 114, article number 26 (2025)

Categories: stat.ML, cs.AI, cs.LG

Keywords: variable selection, practical enhancement, over-parameterization, knoop first generates multiple knockoff, first generates multiple knockoff variables

Tags: journal article

Related articles: Most relevant | Search more

arXiv:1006.5060 [stat.ML] (Published 2010-06-25, updated 2010-07-01)

Learning sparse gradients for variable selection and dimension reduction

Gui-Bo Ye, Xiaohui Xie

arXiv:1811.04646 [stat.ML] (Published 2018-11-12)

Global sensitivity analysis for optimization with variable selection

Adrien Spagnol, Rodolphe Le Riche, Sebastien Da Veiga