arXiv:2002.06469 Abstract | arXiv Analytics

arXiv:2002.06469 [cs.LG]Abstract References Reviews Resources

On Coresets for Support Vector Machines

Murad Tukan, Cenk Baykal, Dan Feldman, Daniela Rus

Published 2020-02-15Version 1

We present an efficient coreset construction algorithm for large-scale Support Vector Machine (SVM) training in Big Data and streaming applications. A coreset is a small, representative subset of the original data points such that a models trained on the coreset are provably competitive with those trained on the original data set. Since the size of the coreset is generally much smaller than the original set, our preprocess-then-train scheme has potential to lead to significant speedups when training SVM models. We prove lower and upper bounds on the size of the coreset required to obtain small data summaries for the SVM problem. As a corollary, we show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings. We evaluate the performance of our algorithm on real-world and synthetic data sets. Our experimental results reaffirm the favorable theoretical properties of our algorithm and demonstrate its practical effectiveness in accelerating SVM training.

Categories: cs.LG, stat.ML

Keywords: efficient coreset construction algorithm, large-scale support vector machine, synthetic data sets, dynamic data settings, experimental results reaffirm

Related articles: Most relevant | Search more

arXiv:2102.07835 [cs.LG] (Published 2021-02-15)

Topological Graph Neural Networks

Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt

arXiv:2311.15887 [cs.LG] (Published 2023-11-27)

FLASC: A Flare-Sensitive Clustering Algorithm: Extending HDBSCAN* for Detecting Branches in Clusters

D. M. Bot, J. Peeters, J. Liesenborgs, J. Aerts

arXiv:2011.03904 [cs.LG] (Published 2020-11-08)

Locally Adaptive Nearest Neighbors

Jan Philip Göpfert, Heiko Wersing, Barbara Hammer