arXiv:1607.05836 Abstract | arXiv Analytics

arXiv:1607.05836 [cs.CV]Abstract References Reviews Resources

Improved Deep Learning of Object Category using Pose Information

Published 2016-07-20Version 1

Despite significant recent progress, the best available computer vision algorithms still lag far behind human capabilities, even for recognizing individual discrete objects under various poses, illuminations, and backgrounds. Here we present a new approach to using object pose information to improve deep network learning. While existing large-scale datasets, e.g. ImageNet, do not have pose information, we leverage the newly published turntable dataset, iLab-20M, which has ~22M images of 704 object instances shot under different lightings, camera viewpoints and turntable rotations, to do more controlled object recognition experiments. We introduce a new convolutional neural network architecture, what/where CNN (2W-CNN), built on a linear-chain feedforward CNN (e.g., AlexNet), augmented by hierarchical layers regularized by object poses. Pose information is only used as feedback signal during training, in addition to category information; during test, the feedforward network only predicts category. To validate the approach, we train both 2W-CNN and AlexNet using a fraction of the dataset, and 2W-CNN achieves 6% performance improvement in category prediction. We show mathematically that 2W-CNN has inherent advantages over AlexNet under the stochastic gradient descent (SGD) optimization procedure. Further more, we fine-tune object recognition on ImageNet by using the pretrained 2W-CNN and AlexNet features on iLab-20M, results show that significant improvements have been achieved, compared with training AlexNet from scratch. Moreover, fine-tuning 2W-CNN features performs even better than fine-tuning the pretrained AlexNet features. These results show pretrained features on iLab- 20M generalizes well to natural image datasets, and 2WCNN learns even better features for object recognition than AlexNet.

Categories: cs.CV

Keywords: pose information, object category, deep learning, convolutional neural network architecture, object pose

Related articles: Most relevant | Search more

arXiv:1602.05531 [cs.CV] (Published 2016-02-17)

On the Use of Deep Learning for Blind Image Quality Assessment

Simone Bianco, Luigi Celona, Paolo Napoletano, Raimondo Schettini

arXiv:1611.09726 [cs.CV] (Published 2016-11-29)

Gossip training for deep learning

Michael Blot, David Picard, Matthieu Cord, Nicolas Thome

arXiv:1609.06782 [cs.CV] (Published 2016-09-22)

Deep Learning for Video Classification and Captioning

Zuxuan Wu, Ting Yao, Yanwei Fu, Yu-Gang Jiang