{
  "id": "1509.03150",
  "version": "v1",
  "published": "2015-09-10T13:45:01.000Z",
  "updated": "2015-09-10T13:45:01.000Z",
  "title": "STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation",
  "authors": [
    "Yunchao Wei",
    "Xiaodan Liang",
    "Yunpeng Chen",
    "Xiaohui Shen",
    "Ming-Ming Cheng",
    "Yao Zhao",
    "Shuicheng Yan"
  ],
  "comment": "4 figures",
  "categories": [
    "cs.CV"
  ],
  "abstract": "Recently, significant improvement has been made on semantic object segmentation due to the development of deep convolutional neural networks (DCNNs). Training such a DCNN usually relies on a large number of images with pixel-level segmentation masks, and annotating these images is very costly in terms of both finance and human effort. In this paper, we propose a simple to complex (STC) framework in which only image-level annotations are utilized to learn DCNNs for semantic segmentation. Specifically, we first train an initial segmentation network called Initial-DCNN with the saliency maps of simple images (i.e., those with a single category of major object(s) and clean background). These saliency maps can be automatically obtained by existing bottom-up salient object detection techniques, where no supervision information is needed. Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations. Finally, more pixel-level segmentation masks of complex images (two or more categories of objects with cluttered background), which are inferred by using Enhanced-DCNN and image-level annotations, are utilized as the supervision information to learn the Powerful-DCNN for semantic segmentation. Our method utilizes $40$K simple images from Flickr.com and 10K complex images from PASCAL VOC for step-wisely boosting the segmentation network. Extensive experimental results on PASCAL VOC 2012 segmentation benchmark demonstrate that the proposed STC framework outperforms the state-of-the-art algorithms for weakly-supervised semantic segmentation by a large margin (e.g., 10.6% over MIL-ILP-seg [1]).",
  "revisions": [
    {
      "version": "v1",
      "updated": "2015-09-10T13:45:01.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "weakly-supervised semantic segmentation",
      "complex framework",
      "image-level annotations",
      "pixel-level segmentation masks",
      "simple images"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable",
      "adsabs": "2015arXiv150903150W"
    }
  }
}