A Recurrent Encoder-Decoder Network for Sequential Face Alignment

We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and tempor

PDF / 1,955,495 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
43 Downloads / 240 Views

DOWNLOAD

REPORT

2

Rutgers University, Piscataway, USA {xipeng.cs,dnm}@rutgers.edu IBM T. J. Watson Research Center, Yorktown Heights, USA [email protected] 3 Snapchat Research, Venice, CA, USA [email protected]

Abstract. We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-ﬁne face alignment using a single network model. At the temporal level, we ﬁrst decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features, yielding better generalization and signiﬁcantly more accurate results at test time. We perform a comprehensive experimental analysis, showing the importance of each component of our proposed model, as well as superior results over the state-of-the-art in standard datasets. Keywords: Recurrent learning · Encoder-decoder · Face alignment

1

Introduction

Face landmark detection plays a fundamental role in many computer vision tasks, such as face recognition, expression analysis, and 3D face modeling. In the past few years, many methods have been proposed to address this problem, with signiﬁcant progress being made towards systems that work in real-world conditions (“in the wild”). Regression-based approaches [6,50] have achieved impressive results by cascading discriminative regression functions that directly map facial appearance to landmark coordinates. In this framework, deep convolutional neural networks have proven eﬀective as a choice for feature extraction and non-linear regression modeling [21,54,55]. Although these methods can achieve very reliable results in c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 38–56, 2016. DOI: 10.1007/978-3-319-46448-0 3

A Recurrent Encoder-Decoder Network for Sequential Face Alignment

39

standard benchmark datasets, they still suﬀer from limited performance in challenging scenarios, e.g., involving large face pose variations and heavy occlusions. A promising direction to address these challenges is to consider video-based face alignment (i.e., sequential face landmark detection) [39], leveraging temporal information as an additional constraint [47]. Despite the long history of research in rigid and non-rigid face tracking [5,10,32,33], current eﬀorts have mostly focused on face alignment in still images [37,45,54,57]. In fact, most methods often perform video-based landmark detection by independently applying models trained on still images in each frame in a tracking-by-detection manner [48], with notable exceptions such as [1,36], which explore increment

Data Loading...

A Recurrent Encoder-Decoder Network for Sequential Face Alignment

Recommend Documents

Face Alignment

Face Alignment Error

Joint Face Alignment and 3D Face Reconstruction

A deep learning framework for face verification without alignment

3D Face Alignment Without Correspondences

Robust Face Alignment Using a Mixture of Invariant Experts

A Network Algorithm to Discover Sequential Patterns

A Recurrent Transformer Network for Novel View Action Synthesis

A Flexible Recurrent Residual Pyramid Network for Video Frame Interpolation

Supervised Transformer Network for Efficient Face Detection

Deep Cascaded Bi-Network for Face Hallucination

Face Alignment by Coarse-to-Fine Deep Convolution Network on Mobile Device