Joint Face Representation Adaptation and Clustering in Videos

Clustering faces in movies or videos is extremely challenging since characters’ appearance can vary drastically under different scenes. In addition, the various cinematic styles make it difficult to learn a universal face representation for all videos. Un

PDF / 2,382,917 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
31 Downloads / 183 Views

DOWNLOAD

REPORT

Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong, China [email protected], [email protected] 2 Shenzhen Key Lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced Technology, CAS, Shenzhen, China

Abstract. Clustering faces in movies or videos is extremely challenging since characters’ appearance can vary drastically under diﬀerent scenes. In addition, the various cinematic styles make it diﬃcult to learn a universal face representation for all videos. Unlike previous methods that assume ﬁxed handcrafted features for face clustering, in this work, we formulate a joint face representation adaptation and clustering approach in a deep learning framework. The proposed method allows face representation to gradually adapt from an external source domain to a target video domain. The adaptation of deep representation is achieved without any strong supervision but through iteratively discovered weak pairwise identity constraints derived from potentially noisy face clustering result. Experiments on three benchmark video datasets demonstrate that our approach generates character clusters with high purity compared to existing video face clustering methods, which are either based on deep face representation (without adaptation) or carefully engineered features. Keywords: Convolutional network Face clustering · Face recognition

1

·

Transfer

learning

·

Introduction

Face clustering in videos aims at grouping detected faces into diﬀerent subsets according to diﬀerent characters. It is a popular research topic [1–5] due to its wide spectrum of applications, e.g. video summarization, content-based retrieval, story segmentation, and character interaction analysis. It can be even exploited as a tool for collecting large-scale dataset for face recognition [4]. Clustering faces in videos is challenging. As shown in Fig. 1, the appearance of a character can vary drastically under diﬀerent scenes as the story progresses. The viewing angles and lighting also vary widely due to the rich cinematic techniques, such as diﬀerent shots (e.g. deep focus, follow shot), variety of lighting techniques, and aesthetics. In many cases, the face is blur due to fast motion or occluded due to interactions between characters. The blurring and occlusion are more severe for fantasy and action movies, i.e. Harry Potter series. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 236–251, 2016. DOI: 10.1007/978-3-319-46487-9 15

Joint Face Representation Adaptation and Clustering in Videos

237

00:06:20 01:09:42 01:22:36 01:53:31 02:14:19

00:36:30 00:44:24 01:26:00 01:34:06

01:52:24

00:00:38 00:02:20 00:43:24 02:09:42 02:20:54

00:47:42 00:53:20 00:53:40 01:02:47

01:56:47

Fig. 1. Faces at diﬀerent time of the movie Harry Potter. Face clustering in videos is challenging due to the various appearance changes as the story progresses.

Conventional techniques that assume ﬁxed handcrafted features [2,4] may fail in the cases as shown in Fig. 1. Speciﬁca

Data Loading...

Joint Face Representation Adaptation and Clustering in Videos

Recommend Documents

Face Tracker-Assisted Multi-Person Face Recognition in Surveillance Videos

Joint Face Alignment and 3D Face Reconstruction

Ordered smooth representation clustering

Joint Generic Learning and Multi-source Domain Adaptation on Unsupervised Face Recognition

Face and Facial Expression Recognition from Real World Videos Intern

JNR: Joint-Based Neural Rig Representation for Compact 3D Face Modeling

Self-supervised deep subspace clustering network for faces in videos

Local Variation Joint Representation for Face Recognition with Single Sample per Person

Robust Subspace Clustering via Latent Smooth Representation Clustering

Face Anti-Spoofing via Disentangled Representation Learning

Deep Image Clustering with Category-Style Representation

Students perception of videos in introductory physics courses of engineering in face-to-face and online environments