arXiv Analytics

Sign in

arXiv:1606.00850 [cs.CV]AbstractReferencesReviewsResources

Face Detection with End-to-End Integration of a ConvNet and a 3D Model

Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang, Wen Gao

Published 2016-06-02Version 1

This paper presents a method of integrating a ConvNet and a 3D model in an end-to-end multi-task discriminative learning fashion for face detection in the wild. In training, we assume a 3D mean face model and the facial key-point annotation of each face image are available, our ConvNet learns to predict (i) face bounding box proposals via estimating the 3D transformation (rotation and translation) of the mean face model as well as (ii) the facial key-points for each face instance. It addresses two issues in the state-of-the-art generic object detection ConvNets (e.g., faster R-CNN \cite{FasterRCNN}) by adapting it for face detection: (i) One is to eliminate the heuristic design of predefined anchor boxes in the region proposals network (RPN) by exploiting a 3D mean face model. (ii) The other is to replace the generic RoI (Region-of-Interest) pooling layer with a "configuration pooling" layer, which respects the underlying object configurations based on the predicted facial key-points, hence, it is more semantics driven. The multi-task loss consists of three terms: the classification Softmax loss and the location smooth $l_1$-losses \cite{FastRCNN} of both the facial key-points and the face bounding boxes. In experiments, our ConvNet is trained on the AFLW dataset \cite{AFLW} only and tested on the FDDB benchmark \cite{FDDB} and the AFW benchmark \cite{AFW}. The results show that the proposed method achieves very competitive state-of-the-art performance in the two benchmarks.

Comments: 17 pages, Y. Li and B. Sun contributed equally to this work
Categories: cs.CV
Related articles: Most relevant | Search more
arXiv:1805.07566 [cs.CV] (Published 2018-05-19)
Wildest Faces: Face Detection and Recognition in Violent Settings
arXiv:1811.12296 [cs.CV] (Published 2018-11-29)
Face Detection in the Operating Room: Comparison of State-of-the-art Methods and a Self-supervised Approach
arXiv:2207.10482 [cs.CV] (Published 2022-07-21)
LPYOLO: Low Precision YOLO for Face Detection on FPGA