arXiv Analytics

Sign in

arXiv:1208.4138 [cs.LG]AbstractReferencesReviewsResources

Semi-supervised Clustering Ensemble by Voting

Ashraf Mohammed Iqbal, Abidalrahman Moh'd, Zahoor Khan

Published 2012-08-20Version 1

Clustering ensemble is one of the most recent advances in unsupervised learning. It aims to combine the clustering results obtained using different algorithms or from different runs of the same clustering algorithm for the same data set, this is accomplished using on a consensus function, the efficiency and accuracy of this method has been proven in many works in literature. In the first part of this paper we make a comparison among current approaches to clustering ensemble in literature. All of these approaches consist of two main steps: the ensemble generation and consensus function. In the second part of the paper, we suggest engaging supervision in the clustering ensemble procedure to get more enhancements on the clustering results. Supervision can be applied in two places: either by using semi-supervised algorithms in the clustering ensemble generation step or in the form of a feedback used by the consensus function stage. Also, we introduce a flexible two parameter weighting mechanism, the first parameter describes the compatibility between the datasets under study and the semi-supervised clustering algorithms used to generate the base partitions, the second parameter is used to provide the user feedback on the these partitions. The two parameters are engaged in a "relabeling and voting" based consensus function to produce the final clustering.

Comments: The International Conference on Information and Communication Systems (ICICS 2009), Amman, Jordan
Categories: cs.LG, stat.ML
Related articles: Most relevant | Search more
arXiv:2406.17952 [cs.LG] (Published 2024-06-25)
LINSCAN -- A Linearity Based Clustering Algorithm
arXiv:2106.00600 [cs.LG] (Published 2021-06-01)
Fair Clustering Using Antidote Data
arXiv:1812.03469 [cs.LG] (Published 2018-12-09)
A matching based clustering algorithm for categorical data