{
  "id": "2103.01988",
  "version": "v1",
  "published": "2021-03-02T19:12:29.000Z",
  "updated": "2021-03-02T19:12:29.000Z",
  "title": "Self-supervised Pretraining of Visual Features in the Wild",
  "authors": [
    "Priya Goyal",
    "Mathilde Caron",
    "Benjamin Lefaudeux",
    "Min Xu",
    "Pengchao Wang",
    "Vivek Pai",
    "Mannat Singh",
    "Vitaliy Liptchinsky",
    "Ishan Misra",
    "Armand Joulin",
    "Piotr Bojanowski"
  ],
  "categories": [
    "cs.CV",
    "cs.AI"
  ],
  "abstract": "Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl",
  "revisions": [
    {
      "version": "v1",
      "updated": "2021-03-02T19:12:29.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "visual features",
      "self-supervised pretraining",
      "self-supervised learning",
      "1b random images",
      "control environment"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}