Face Detection with End-to-End Integration of a ConvNet and a 3D Model

This paper presents a method for face detection in the wild, which integrates a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework. The 3D mean face model is predefined and fixed (e.g., we used the one provided

PDF / 4,232,271 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
50 Downloads / 258 Views

DOWNLOAD

REPORT

Nat’l Engineering Laboratory for Video Technology, Key Laboratory of Machine Perception (MoE), Cooperative Medianet Innovation Center, Shanghai Sch’l of EECS, Peking University, Beijing 100871, China {leo.liyunzhu,sunbenyuan,Yizhou.Wang}@pku.edu.cn 2 Department of ECE and the Visual Narrative Cluster, North Carolina State University, Raleigh, USA tianfu [email protected]

Abstract. This paper presents a method for face detection in the wild, which integrates a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework. The 3D mean face model is predeﬁned and ﬁxed (e.g., we used the one provided in the AFLW dataset). The ConvNet consists of two components: (i) The face proposal component computes face bounding box proposals via estimating facial key-points and the 3D transformation (rotation and translation) parameters for each predicted key-point w.r.t. the 3D mean face model. (ii) The face veriﬁcation component computes detection results by pruning and reﬁning proposals based on facial key-points based conﬁguration pooling. The proposed method addresses two issues in adapting stateof-the-art generic object detection ConvNets (e.g., faster R-CNN) for face detection: (i) One is to eliminate the heuristic design of predeﬁned anchor boxes in the region proposals network (RPN) by exploiting a 3D mean face model. (ii) The other is to replace the generic RoI (Regionof-Interest) pooling layer with a conﬁguration pooling layer to respect underlying object structures. The multi-task loss consists of three terms: the classiﬁcation Softmax loss and the location smooth l1 -losses of both the facial key-points and the face bounding boxes. In experiments, our ConvNet is trained on the AFLW dataset only and tested on the FDDB benchmark with ﬁne-tuning and on the AFW benchmark without ﬁnetuning. The proposed method obtains very competitive state-of-the-art performance in the two benchmarks. Keywords: Face detection · Face 3D model · ConvNet · Deep learning · Multi-task learning

Y. Li and B. Sun contributed equally to this work and are joint ﬁrst authors. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 420–436, 2016. DOI: 10.1007/978-3-319-46487-9 26

Face Detection with a ConvNet and a 3D Model

1 1.1

421

Introduction Motivation and Objective

Face detection has been used as a core module in a wide spectrum of applications such as surveillance, mobile communication and human-computer interaction. It is arguably one of the most successful applications of computer vision. Face detection in the wild continues to play an important role in the era of visual big data (e.g., images and videos on the web and in social media). However, it remains a challenging problem in computer vision due to the large appearance variations caused by nuisance variabilities including viewpoints, occlusion, facial expression, resolution, illumination and cosmetics, etc. It has been a long history that computer vision researchers study how to learn a better representa

Data Loading...

Face Detection with End-to-End Integration of a ConvNet and a 3D Model

Recommend Documents

A Robust Face Detection System for 3D Display System

Face Recognition Using a Unified 3D Morphable Model

Joint Face Detection and Alignment with a Deformable Hough Transform Model

Joint Face Alignment and 3D Face Reconstruction

Inequality-Constrained and Robust 3D Face Model Fitting

Putting Jewellery and Accessories on a 3D Face Model Generated from 2D Image

Face Detection

3D Face Recognition

Face Recognition, 3D-Based

A Review of Human Face Detection in Complex Environment

Eyeglasses 3D Shape Reconstruction from a Single Face Image

A Synthesis Method for Personalized 3D Face Reconstruction