arXiv Analytics

Sign in

arXiv:2201.02448 [cs.LG]AbstractReferencesReviewsResources

k-Center Clustering with Outliers in Sliding Windows

Paolo Pellizzoni, Andrea Pietracaprina, Geppino Pucci

Published 2022-01-07Version 1

Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a given number $z$ of points of the dataset, called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window $W$ of the most recent data items. Our algorithms achieve $O(1)$ approximation and, remarkably, require a working memory linear in $k+z$ and only logarithmic in $|W|$. As a by-product, we show how to estimate the effective diameter of the window $W$, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of our theoretical results.

Related articles: Most relevant | Search more
arXiv:1902.00632 [cs.LG] (Published 2019-02-02)
Efficient estimation of AUC in a sliding window
arXiv:2310.01012 [cs.LG] (Published 2023-10-02, updated 2023-11-21)
Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients
arXiv:1906.08129 [cs.LG] (Published 2019-06-19)
Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification