{ "id": "2112.00725", "version": "v2", "published": "2021-12-01T18:59:54.000Z", "updated": "2022-01-19T12:54:40.000Z", "title": "Extrapolating from a Single Image to a Thousand Classes using Distillation", "authors": [ "Yuki M. Asano", "Aaqib Saeed" ], "comment": "Webpage/code: https://single-image-distill.github.io/", "categories": [ "cs.CV" ], "abstract": "What can neural networks learn about the visual world from a single image? While it obviously cannot contain the multitudes of possible objects, scenes and lighting conditions that exist - within the space of all possible 256^(3x224x224) 224-sized square images, it might still provide a strong prior for natural images. To analyze this hypothesis, we develop a framework for training neural networks from scratch using a single image by means of knowledge distillation from a supervisedly pretrained teacher. With this, we find that the answer to the above question is: 'surprisingly, a lot'. In quantitative terms, we find top-1 accuracies of 94%/74% on CIFAR-10/100, 59% on ImageNet, and by extending this method to video and audio, 77% on UCF-101 and 84% on SpeechCommands. In extensive analyses we disentangle the effect of augmentations, choice of source image and network architectures and also discover \"panda neurons\" in networks that have never seen a panda. This work shows that one image can be used to extrapolate to thousands of object classes and motivates a renewed research agenda on the fundamental interplay of augmentations and image.", "revisions": [ { "version": "v2", "updated": "2022-01-19T12:54:40.000Z" } ], "analyses": { "keywords": [ "single image", "thousand classes", "neural networks learn", "natural images", "visual world" ], "tags": [ "github project" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }