Supervised Transformer Network for Efficient Face Detection

Large pose variations remain to be a challenge that confronts real-word face detection. We propose a new cascaded Convolutional Neural Network, dubbed the name Supervised Transformer Network, to address this challenge. The first stage is a multi-task Regi

PDF / 2,584,849 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
59 Downloads / 287 Views

DOWNLOAD

REPORT

Abstract. Large pose variations remain to be a challenge that confronts real-word face detection. We propose a new cascaded Convolutional Neural Network, dubbed the name Supervised Transformer Network, to address this challenge. The ﬁrst stage is a multi-task Region Proposal Network (RPN), which simultaneously predicts candidate face regions along with associated facial landmarks. The candidate regions are then warped by mapping the detected facial landmarks to their canonical positions to better normalize the face patterns. The second stage, which is a RCNN, then veriﬁes if the warped candidate regions are valid faces or not. We conduct end-to-end learning of the cascaded network, including optimizing the canonical positions of the facial landmarks. This supervised learning of the transformations automatically selects the best scale to diﬀerentiate face/non-face patterns. By combining feature maps from both stages of the network, we achieve state-of-the-art detection accuracies on several public benchmarks. For real-time performance, we run the cascaded network only on regions of interests produced from a boosting cascade face detector. Our detector runs at 30 FPS on a single CPU core for a VGA-resolution image.

1

Introduction

Among the various factors that confront real-world face detection, large pose variations remain to be a big challenge. For example, the seminal Viola-Jones [1] detector works well for near-frontal faces, but become much less eﬀective for faces in poses that are far from frontal views, due to the weakness of the Haar features on non-frontal faces. There were abundant works attempted to tackle with large pose variations under the regime of the boosting cascade advocated by Viola and Jones [1]. Most of them adopt a divide-and-conquer strategy to build a multi-view face detector. Some works [2–4] proposed to train a detector cascade for each view and combine their results of all detectors at the test time. Some other works [5–7] proposed to ﬁrst estimate the face pose and then run the cascade of the corresponding face pose to verify the detection. The complexity of the former approach increases Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46454-1 8) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part V, LNCS 9909, pp. 122–138, 2016. DOI: 10.1007/978-3-319-46454-1 8

Supervised Transformer Network for Eﬃcient Face Detection

123

with the number of pose categories, while the accuracy of the latter is prone to the mistakes of pose estimation. Part-based model oﬀers an alternative solution [8–10]. These detectors are ﬂexible and robust to both pose variation and partial occlusion, since they can reliably detect the faces based on some conﬁdent part detections. However, these methods always require the target face to be large and clear, which is essential to reliably model the parts. Other works approach to this issue by using more sophisticated

Data Loading...

Supervised Transformer Network for Efficient Face Detection

Recommend Documents

Direction-Sensitivity Features Ensemble Network for Rotation-Invariant Face Detection

Deep Detection for Face Manipulation

Face Detection

Spatial division networks for weakly supervised detection

Temporal Consistency Based Deep Face Forgery Detection Network

An improved multi-scale face detection using convolutional neural network

Supervised Hyperparameter Estimation for Anomaly Detection

Spacecraft Anomaly Detection via Transformer Reconstruction Error

A Recurrent Transformer Network for Novel View Action Synthesis

HoG Multi-face Detection

Deep Cascaded Bi-Network for Face Hallucination

An Improved Approach for Face Detection