arXiv:1808.08765 Abstract | arXiv Analytics

arXiv:1808.08765 [stat.ML]Abstract References Reviews Resources

Identifiability of Low-Rank Sparse Component Analysis

Published 2018-08-27Version 1

Sparse component analysis (SCA) is the following problem: Given an input matrix $M$ and an integer $r$, find a dictionary $D$ with $r$ columns and a sparse matrix $B$ with $r$ rows such that $M \approx DB$. A key issue in SCA is identifiability, that is, characterizing the conditions under which $D$ and $B$ are essentially unique (that is, they are unique up to permutation and scaling of the columns of $D$ and rows of $B$). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of data points (that is, columns of $M$) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when $D$ has rank $r$, drastically improving with respect to previous results. In particular, we show that if each column of $B$ contains at least $s$ zeros then $\mathcal{O}(r^3/s^2)$ data points are sufficient to obtain an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by $r-1$ columns of $D$. This implies for example that for a fixed proportion of zeros (constant and independent of $r$, e.g., 10\% of zero entries in $B$), one only requires $O(r)$ data points to guarantee identifiability.

Comments: 15 pages, 2 figures

Categories: stat.ML, cs.LG

Keywords: low-rank sparse component analysis, data points, deterministic scenario, essentially unique, zero entries

Related articles: Most relevant | Search more

arXiv:2404.19557 [stat.ML] (Published 2024-04-30)

Neural Dynamic Data Valuation

Zhangyong Liang, Huanhuan Gao, Ji Zhang

arXiv:2407.18158 [stat.ML] (Published 2024-07-25)

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

Sanae Lotfi, Yilun Kuang, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson

arXiv:1905.05125 [stat.ML] (Published 2019-05-13)

Exact high-dimensional asymptotics for support vector machine

Haoyang Liu

arXiv Analytics

arXiv:1808.08765 [stat.ML]Abstract References Reviews Resources

Identifiability of Low-Rank Sparse Component Analysis

Links

Toolbox

arXiv:1808.08765 [stat.ML]AbstractReferencesReviewsResources

Identifiability of Low-Rank Sparse Component Analysis

Links

Toolbox

arXiv:1808.08765 [stat.ML]Abstract References Reviews Resources