arXiv:1811.02722 Abstract | arXiv Analytics

arXiv:1811.02722 [cs.LG]Abstract References Reviews Resources

Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data

Minh Tuan Doan, Jianzhong Qi, Sutharshan Rajasegarar, Christopher Leckie

Published 2018-11-07Version 1

Subspace clustering aims to find groups of similar objects (clusters) that exist in lower dimensional subspaces from a high dimensional dataset. It has a wide range of applications, such as analysing high dimensional sensor data or DNA sequences. However, existing algorithms have limitations in finding clusters in non-disjoint subspaces and scaling to large data, which impinge their applicability in areas such as bioinformatics and the Internet of Things. We aim to address such limitations by proposing a subspace clustering algorithm using a bottom-up strategy. Our algorithm first searches for base clusters in low dimensional subspaces. It then forms clusters in higher-dimensional subspaces using these base clusters, which we formulate as a frequent pattern mining problem. This formulation enables efficient search for clusters in higher-dimensional subspaces, which is done using FP-trees. The proposed algorithm is evaluated against traditional bottom-up clustering algorithms and state-of-the-art subspace clustering algorithms. The experimental results show that the proposed algorithm produces clusters with high accuracy, and scales well to large volumes of data. We also demonstrate the algorithm's performance using real-life data, including ten genomic datasets and a car parking occupancy dataset.

Comments: Accepted to IEEE International Conference on Big Data 2018

Categories: cs.LG, stat.ML

Keywords: scalable bottom-up subspace clustering, high dimensional data, high dimensional sensor data, subspace clustering algorithm, dimensional subspaces

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:1909.03681 [cs.LG] (Published 2019-09-09)

Outlier Detection in High Dimensional Data

Firuz Kamalov, Ho Hon Leung

arXiv:2206.03977 [cs.LG] (Published 2022-06-08)

Diffusion Curvature for Estimating Local Curvature in High Dimensional Data

Dhananjay Bhaskar, Kincaid MacDonald, Oluwadamilola Fasina, Dawson Thomas, Bastian Rieck, Ian Adelstein, Smita Krishnaswamy

arXiv:2006.07575 [cs.LG] (Published 2020-06-13)