arXiv:1511.05045 [cs.CV]AbstractReferencesReviewsResources
Handcrafted Local Features are Convolutional Neural Networks
Zhenzhong Lan, Shoou-I Yu, Ming Lin, Bhiksha Raj, Alexander G. Hauptmann
Published 2015-11-16Version 1
In image and video classification research, handcrafted local features and learning based features are the chief reason for its considerable progress in the past decades. These two architectures were proposed roughly at the same time, and have flourished at overlapping stages of history, but are typically viewed as distinct approaches. In this paper, we emphasize their structural similarities and show how such a unified view help us in designing features that balance efficiency and effectiveness. As an example, we study the problem of developing an efficient motion feature for action recognition. We approach this problem by first showing that traditional handcrafted local features are Convolutional Neural Networks (CNNs) that can be efficiently trained but have limited modeling capacities. We then propose a two-stream Convolutional PCA-ISA model to enhance the modeling capacities of local feature pipelines at the same time keep the computational complexity to be low. Through customarily designed network structures for pixels and optical flow, our method reflect distinctive characteristics of these two data sources. We evaluate our proposed method on standard action recognition benchmarks of UCF101 and HMDB51, where it performs better than state-of-the-art CNN approaches in both training time and accuracy.