{ "id": "2103.01988", "version": "v1", "published": "2021-03-02T19:12:29.000Z", "updated": "2021-03-02T19:12:29.000Z", "title": "Self-supervised Pretraining of Visual Features in the Wild", "authors": [ "Priya Goyal", "Mathilde Caron", "Benjamin Lefaudeux", "Min Xu", "Pengchao Wang", "Vivek Pai", "Mannat Singh", "Vitaliy Liptchinsky", "Ishan Misra", "Armand Joulin", "Piotr Bojanowski" ], "categories": [ "cs.CV", "cs.AI" ], "abstract": "Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl", "revisions": [ { "version": "v1", "updated": "2021-03-02T19:12:29.000Z" } ], "analyses": { "keywords": [ "visual features", "self-supervised pretraining", "self-supervised learning", "1b random images", "control environment" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }