{
  "id": "1011.5395",
  "version": "v1",
  "published": "2010-11-24T15:18:42.000Z",
  "updated": "2010-11-24T15:18:42.000Z",
  "title": "The Sample Complexity of Dictionary Learning",
  "authors": [
    "Daniel Vainsencher",
    "Shie Mannor",
    "Alfred M. Bruckstein"
  ],
  "doi": "10.1016/j.specom.2013.01.005",
  "categories": [
    "stat.ML",
    "cs.LG"
  ],
  "abstract": "A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a set of signals to be represented. Can we expect that the representation found by such a dictionary for a previously unseen example from the same source will have L_2 error of the same magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study this questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L_2 error in representation when the dictionary is used. For the case of l_1 regularized coefficient selection we provide a generalization bound of the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the number of elements in the dictionary, lambda is a bound on the l_1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m)) under an assumption on the level of orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results further yield fast rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher complexity. We provide similar results in a general setting using kernels with weak smoothness requirements.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2010-11-24T15:18:42.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "sample complexity",
      "dictionary learning",
      "np log",
      "generalization bound",
      "coefficient selection"
    ],
    "tags": [
      "journal article"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable",
      "adsabs": "2010arXiv1011.5395V"
    }
  }
}