arXiv:1909.03681 Abstract | arXiv Analytics

arXiv:1909.03681 [cs.LG]Abstract References Reviews Resources

Outlier Detection in High Dimensional Data

Published 2019-09-09Version 1

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by the $F_1$-score. Our method also produces better-than-average execution times compared to the benchmark methods.

Categories: cs.LG, cs.AI, stat.ML

Keywords: high dimensional data, detection algorithms perform, outlier detection algorithm, high-dimensional data poses unique challenges, produces better-than-average execution times

Related articles: Most relevant | Search more

arXiv:2206.03977 [cs.LG] (Published 2022-06-08)

Diffusion Curvature for Estimating Local Curvature in High Dimensional Data

Dhananjay Bhaskar, Kincaid MacDonald, Oluwadamilola Fasina, Dawson Thomas, Bastian Rieck, Ian Adelstein, Smita Krishnaswamy

arXiv:2211.08414 [cs.LG] (Published 2022-11-15)

Model free Shapley values for high dimensional data

Naofumi Hama, Masayoshi Mase, Art B. Owen

arXiv:1811.02722 [cs.LG] (Published 2018-11-07)

Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data

Minh Tuan Doan, Jianzhong Qi, Sutharshan Rajasegarar, Christopher Leckie