arXiv Analytics

Sign in

arXiv:1303.1152 [cs.LG]AbstractReferencesReviewsResources

An Equivalence between the Lasso and Support Vector Machines

Martin Jaggi

Published 2013-03-05, updated 2014-04-25Version 2

We investigate the relation of two fundamental tools in machine learning and signal processing, that is the support vector machine (SVM) for classification, and the Lasso technique used in regression. We show that the resulting optimization problems are equivalent, in the following sense. Given any instance of an $\ell_2$-loss soft-margin (or hard-margin) SVM, we construct a Lasso instance having the same optimal solutions, and vice versa. As a consequence, many existing optimization algorithms for both SVMs and Lasso can also be applied to the respective other problem instances. Also, the equivalence allows for many known theoretical insights for SVM and Lasso to be translated between the two settings. One such implication gives a simple kernelized version of the Lasso, analogous to the kernels used in the SVM setting. Another consequence is that the sparsity of a Lasso solution is equal to the number of support vectors for the corresponding SVM instance, and that one can use screening rules to prune the set of support vectors. Furthermore, we can relate sublinear time algorithms for the two problems, and give a new such algorithm variant for the Lasso. We also study the regularization paths for both methods.

Comments: Book chapter in Regularization, Optimization, Kernels, and Support Vector Machines, Johan A.K. Suykens, Marco Signoretto, Andreas Argyriou (Editors), 2014
Categories: cs.LG, stat.ML
Subjects: 65C60, 90C25, 68T05, F.2.2, I.5.1
Related articles: Most relevant | Search more
arXiv:2105.14084 [cs.LG] (Published 2021-05-28)
Support vector machines and linear regression coincide with very high-dimensional features
arXiv:1902.04622 [cs.LG] (Published 2019-02-12)
Learning Theory and Support Vector Machines - a primer
arXiv:1203.4523 [cs.LG] (Published 2012-03-20, updated 2012-09-11)
On the Equivalence between Herding and Conditional Gradient Algorithms