arXiv Analytics

Sign in

arXiv:1907.01660 [stat.ML]AbstractReferencesReviewsResources

A flexible EM-like clustering algorithm for noisy data

Violeta Roizman, Matthieu Jonckheere, Frédéric Pascal

Published 2019-07-02Version 1

We design a new robust clustering algorithm that can deal efficiently with noise and outliers in diverse datasets. As an EM-like algorithm, it is based on both estimations of clusters centers and covariances but also on a scale parameter per data-point. This allows the algorithm to accommodate for heavier/lighter tails distributions (in comparison to classical Gaussian distributions) and outliers without significantly loosing efficiency in classical scenarios. Convergence and accuracy of the algorithm are first analyzed by considering synthetic data. Then, we show that the proposed algorithm outperforms other classical unsupervised methods of the literature such as k-means, the EM algorithm and HDBSCAN when applied to real datasets as MNIST, NORB and 20newsgroups.

Related articles: Most relevant | Search more
arXiv:1308.5546 [stat.ML] (Published 2013-08-26)
Sparse and Non-Negative BSS for Noisy Data
arXiv:2206.15215 [stat.ML] (Published 2022-06-30)
Learning Nonparametric Ordinary differential Equations: Application to Sparse and Noisy Data
arXiv:2407.10854 [stat.ML] (Published 2024-07-15)
Principal Component Flow Map Learning of PDEs from Incomplete, Limited, and Noisy Data