arXiv Analytics

Sign in

arXiv:1510.08160 [cs.CV]AbstractReferencesReviewsResources

Scale-aware Fast R-CNN for Pedestrian Detection

Jianan Li, Xiaodan Liang, ShengMei Shen, Tingfa Xu, Shuicheng Yan

Published 2015-10-28Version 1

While convolutional neural network (CNN) architectures have achieved great success in various vision tasks, the critical scale problem is still much under-explored, especially for pedestrian detection. Current approaches mainly focus on using large numbers of training images with different scales to improve the network capability or result fusions by multi-scale crops of images during testing. Designing a CNN architecture that can intrinsically capture the characteristics of large-scale and small-scale objects and also retain the scale invariance property is still a very challenging problem. In this paper, we propose a novel scale-aware Fast R-CNN to handle the detection of small object instances which are very common in pedestrian detection. Our architecture incorporates a large-scale sub-network and a small-scale sub-network into a unified architecture by leveraging the scale-aware weighting during training. The heights of object proposals are utilized to specify different scale-aware weights for the two sub-networks. Extensive evaluations on the challenging Caltech~\cite{dollar2012pedestrian} demonstrate the superiority of the proposed architecture over the state-of-the-art methods~\cite{compact,ta_cnn}. In particular, the miss rate on the Caltech dataset is reduced to $9.68\%$ by our method, significantly smaller than $11.75\%$ by CompACT-Deep~\cite{compact} and $20.86\%$ by TA-CNN~\cite{ta_cnn}.

Related articles: Most relevant | Search more
arXiv:1609.02132 [cs.CV] (Published 2016-09-07)
UberNet: Training a `Universal' Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory
arXiv:1409.4326 [cs.CV] (Published 2014-09-15)
Computing the Stereo Matching Cost with a Convolutional Neural Network
arXiv:2206.10041 [cs.CV] (Published 2022-06-20)
MPA: MultiPath++ Based Architecture for Motion Prediction