arXiv:1705.07592 Abstract | arXiv Analytics

arXiv:1705.07592 [stat.ML]Abstract References Reviews Resources

Improved Clustering with Augmented k-means

Published 2017-05-22Version 1

Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can't be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report.

Categories: stat.ML

Subjects: 62H30, I.5.3, G.3, G.4

Keywords: logistic regression, real datasets, augmented k-means frequently outperforms k-means, current cluster labels, important classes

Related articles: Most relevant | Search more

arXiv:1611.08618 [stat.ML] (Published 2016-11-25)

A Benchmark and Comparison of Active Learning for Logistic Regression

Yazhou Yang, Marco Loog

arXiv:1708.07826 [stat.ML] (Published 2017-08-24)

Logistic Regression as Soft Perceptron Learning

Raul Rojas

arXiv:2104.13026 [stat.ML] (Published 2021-04-27)

The Hessian Screening Rule

Johan Larsson, Jonas Wallin

arXiv Analytics

arXiv:1705.07592 [stat.ML]Abstract References Reviews Resources

Improved Clustering with Augmented k-means

Links

Toolbox

arXiv:1705.07592 [stat.ML]AbstractReferencesReviewsResources

Improved Clustering with Augmented k-means

Links

Toolbox

arXiv:1705.07592 [stat.ML]Abstract References Reviews Resources