Joint Face Detection and Alignment with a Deformable Hough Transform Model

We propose a method for joint face detection and alignment in unconstrained images and videos. Historically, these problems have been addressed disjointly in literature with the overall performance of the whole pipeline having been scantily assessed. We s

  • PDF / 2,504,541 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 2 Downloads / 187 Views

DOWNLOAD

REPORT


Abstract. We propose a method for joint face detection and alignment in unconstrained images and videos. Historically, these problems have been addressed disjointly in literature with the overall performance of the whole pipeline having been scantily assessed. We show that a pipeline built by combining state-of-the-art methods for both tasks produces unsatisfactory overall performance. To address this limitation, we propose an approach that addresses both tasks, which we call Deformable Hough Transform Model (DHTM). In particular, we make the following contributions: (a) Rather than scanning the image with discriminatively trained filters, we propose to employ cascaded regression in a sliding window fashion to fit a facial deformable model over the whole image/video. (b) We propose to capitalize on the large basin of attraction of cascaded regression to set up a Hough-Transform voting scheme for detecting faces and filtering out irrelevant background. (c) We report state-of-the-art performance on the most challenging and widely-used data sets for face detection, alignment and tracking.

Keywords: Face detection sion · Hough Transform

1

·

Alignment

·

Tracking

·

Cascaded regres-

Introduction

From Viola and Jones [1] to Deformable Part Models [2–4] and from Active Appearance Models [5] to Cascaded Regression [6–9], face detection, alignment and tracking have all witnessed tremendous progress over the last years. Besides new methodologies, another notable development in the field has been the collection and annotation of large facial data sets captured in-the-wild [3,10–13], for which a number of newly developed methods have been shown to produce remarkable results. Despite the progress in the field, the majority of prior work has disjointly considered the two problems: there is a large number of papers on face detection and perhaps even a larger number of papers on face alignment and tracking, but to the best of our knowledge there are only two papers [3,14] that study the combined problem of detection and alignment and no method that addresses and evaluates all three tasks jointly. However, for many subsequent, higher level c Springer International Publishing Switzerland 2016  G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part II, LNCS 9914, pp. 569–580, 2016. DOI: 10.1007/978-3-319-48881-3 39

570

J. McDonagh and G. Tzimiropoulos

tasks, like face recognition, facial expression and attribute analysis, what matters is the overall performance in terms of accuracy in landmark localization. Notably, recent state-of-the-art methods for such tasks heavily rely on the accurate detection of landmarks (see for example [15,16]). As we show hereafter, the overall performance in landmark localization accuracy might be unsatisfactory even by putting two recently proposed state-ofthe-art methods (we used [4] for face detection and [9] for landmark localization) together. The reason for this is that face detection follows object detection in terms of measuring performance and, in particular, it uses the PASCAL VOC prec