arXiv Analytics

Sign in

arXiv:2310.12806 [stat.ML]AbstractReferencesReviewsResources

DCSI -- An improved measure of cluster separability based on separation and connectedness

Jana Gauss, Fabian Scheipl, Moritz Herrmann

Published 2023-10-19Version 1

Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. A review of the existing literature shows that neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate the central aspects of separability for density-based clustering: between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.

Related articles: Most relevant | Search more
arXiv:2104.03158 [stat.ML] (Published 2021-04-07)
Prediction with Missing Data
arXiv:0710.3742 [stat.ML] (Published 2007-10-19)
Bayesian Online Changepoint Detection
arXiv:2311.00564 [stat.ML] (Published 2023-11-01)
Online Student-$t$ Processes with an Overall-local Scale Structure for Modelling Non-stationary Data