Joint Face Representation Adaptation and Clustering in Videos
Clustering faces in movies or videos is extremely challenging since characters’ appearance can vary drastically under different scenes. In addition, the various cinematic styles make it difficult to learn a universal face representation for all videos. Un
- PDF / 2,382,917 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 31 Downloads / 169 Views
Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong, China [email protected], [email protected] 2 Shenzhen Key Lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced Technology, CAS, Shenzhen, China
Abstract. Clustering faces in movies or videos is extremely challenging since characters’ appearance can vary drastically under different scenes. In addition, the various cinematic styles make it difficult to learn a universal face representation for all videos. Unlike previous methods that assume fixed handcrafted features for face clustering, in this work, we formulate a joint face representation adaptation and clustering approach in a deep learning framework. The proposed method allows face representation to gradually adapt from an external source domain to a target video domain. The adaptation of deep representation is achieved without any strong supervision but through iteratively discovered weak pairwise identity constraints derived from potentially noisy face clustering result. Experiments on three benchmark video datasets demonstrate that our approach generates character clusters with high purity compared to existing video face clustering methods, which are either based on deep face representation (without adaptation) or carefully engineered features. Keywords: Convolutional network Face clustering · Face recognition
1
·
Transfer
learning
·
Introduction
Face clustering in videos aims at grouping detected faces into different subsets according to different characters. It is a popular research topic [1–5] due to its wide spectrum of applications, e.g. video summarization, content-based retrieval, story segmentation, and character interaction analysis. It can be even exploited as a tool for collecting large-scale dataset for face recognition [4]. Clustering faces in videos is challenging. As shown in Fig. 1, the appearance of a character can vary drastically under different scenes as the story progresses. The viewing angles and lighting also vary widely due to the rich cinematic techniques, such as different shots (e.g. deep focus, follow shot), variety of lighting techniques, and aesthetics. In many cases, the face is blur due to fast motion or occluded due to interactions between characters. The blurring and occlusion are more severe for fantasy and action movies, i.e. Harry Potter series. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 236–251, 2016. DOI: 10.1007/978-3-319-46487-9 15
Joint Face Representation Adaptation and Clustering in Videos
237
00:06:20 01:09:42 01:22:36 01:53:31 02:14:19
00:36:30 00:44:24 01:26:00 01:34:06
01:52:24
00:00:38 00:02:20 00:43:24 02:09:42 02:20:54
00:47:42 00:53:20 00:53:40 01:02:47
01:56:47
Fig. 1. Faces at different time of the movie Harry Potter. Face clustering in videos is challenging due to the various appearance changes as the story progresses.
Conventional techniques that assume fixed handcrafted features [2,4] may fail in the cases as shown in Fig. 1. Specifica
Data Loading...