{
  "id": "2112.00725",
  "version": "v2",
  "published": "2021-12-01T18:59:54.000Z",
  "updated": "2022-01-19T12:54:40.000Z",
  "title": "Extrapolating from a Single Image to a Thousand Classes using Distillation",
  "authors": [
    "Yuki M. Asano",
    "Aaqib Saeed"
  ],
  "comment": "Webpage/code: https://single-image-distill.github.io/",
  "categories": [
    "cs.CV"
  ],
  "abstract": "What can neural networks learn about the visual world from a single image? While it obviously cannot contain the multitudes of possible objects, scenes and lighting conditions that exist - within the space of all possible 256^(3x224x224) 224-sized square images, it might still provide a strong prior for natural images. To analyze this hypothesis, we develop a framework for training neural networks from scratch using a single image by means of knowledge distillation from a supervisedly pretrained teacher. With this, we find that the answer to the above question is: 'surprisingly, a lot'. In quantitative terms, we find top-1 accuracies of 94%/74% on CIFAR-10/100, 59% on ImageNet, and by extending this method to video and audio, 77% on UCF-101 and 84% on SpeechCommands. In extensive analyses we disentangle the effect of augmentations, choice of source image and network architectures and also discover \"panda neurons\" in networks that have never seen a panda. This work shows that one image can be used to extrapolate to thousands of object classes and motivates a renewed research agenda on the fundamental interplay of augmentations and image.",
  "revisions": [
    {
      "version": "v2",
      "updated": "2022-01-19T12:54:40.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "single image",
      "thousand classes",
      "neural networks learn",
      "natural images",
      "visual world"
    ],
    "tags": [
      "github project"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}