arXiv Analytics

Sign in

arXiv:2311.03087 [cs.LG]AbstractReferencesReviewsResources

Persistent homology for high-dimensional data based on spectral methods

Sebastian Damrich, Philipp Berens, Dmitry Kobak

Published 2023-11-06Version 1

Persistent homology is a popular computational tool for detecting non-trivial topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case vanilla persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for most existing refinements of persistent homology. As a remedy, we find that spectral distances on the $k$-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow persistent homology to detect the correct topology even in the presence of high-dimensional noise. Furthermore, we derive a novel closed-form expression for effective resistance in terms of the eigendecomposition of the graph Laplacian, and describe its relation to diffusion distances. Finally, we apply these methods to several high-dimensional single-cell RNA-sequencing datasets and show that spectral distances on the $k$-nearest-neighbor graph allow robust detection of cell cycle loops.

Related articles: Most relevant | Search more
arXiv:1710.03113 [cs.LG] (Published 2017-10-09)
From Subspaces to Metrics and Beyond: Toward Multi-Diversified Ensemble Clustering of High-Dimensional Data
arXiv:2401.15610 [cs.LG] (Published 2024-01-28)
Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data
arXiv:1505.06907 [cs.LG] (Published 2015-05-26)
Using Dimension Reduction to Improve the Classification of High-dimensional Data