arXiv Analytics

Sign in

arXiv:1411.2374 [cs.LG]AbstractReferencesReviewsResources

Similarity Learning for High-Dimensional Sparse Data

Kuan Liu, Aurélien Bellet, Fei Sha

Published 2014-11-10Version 1

A good measure of similarity between data points is crucial to the performance of many machine learning, data mining and information retrieval tasks. Metric and similarity learning methods have emerged as a powerful way to automatically learn them from data, but they do not scale well with the feature dimensionality. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is done by assuming that the similarity parameters are decomposable as a convex combination of rank-one matrices with a specific sparsity structure, together with the use of an approximate Frank-Wolfe procedure to learn these parameters based on relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys strong convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.

Related articles:
arXiv:1604.05753 [cs.LG] (Published 2016-04-19)
Sketching and Neural Networks