Self-supervised deep subspace clustering network for faces in videos
- PDF / 1,816,496 Bytes
- 9 Pages / 595.276 x 790.866 pts Page_size
- 24 Downloads / 177 Views
ORIGINAL ARTICLE
Self-supervised deep subspace clustering network for faces in videos Yunhao Qiu1 · Pengyi Hao2 Accepted: 19 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Video face clustering is a challenging task with wide applications. Unlike ordinary image clustering, faces in videos usually exist as a series of tracks, which provide prior knowledge. Specifically, faces from the same track are considered to be the same person while faces from the different tracks appearing in the same frame are considered to be different people. Based on this prior knowledge, we propose the self-supervised deep subspace clustering network (SDSCN). SDSCN adopts autoencoder to nonlinearly map the faces into latent space and adds the fully connected layer between the encoder and decoder to explore the self-expressiveness property. Prior knowledge is automatically incorporated into the loss function to guide the training. We further propose efficient training strategy for our network and clustering. The experiments on the two public datasets (BBT0101 and Notting-Hill) demonstrate the advantages of our method. Specifically, our method achieves about 3–17% improvement in clustering accuracy on BBT0101 and about 6–23% improvement on Notting-Hill compared to the state-of-the-art methods. Keywords Video face clustering · Prior knowledge · SDSCN
1 Introduction In this paper, we aim to tackle the task of clustering faces in the videos. To be more specific, we have some face tracks extracted from videos, each of which contains a few faces, and we hope to partition these faces into several classes according to their identities. That is, different people are divided into different classes. A good solution of this issue can be applied to many fields like video segmentation, cast list of a feature-length movie, content-based video retrieval and so on. Nevertheless, it is a much more challenging task compared to the ordinary image clustering task (see Fig. 1), due to the various lighting conditions and blurs caused by actors’ fast motions in the videos. Besides, actors’ poses and facial
Yunhao Qiu and Pengyi Hao have contributed equally.
B
Pengyi Hao [email protected] Yunhao Qiu [email protected]
1
School of Mathematical Science, Zhejiang University, Hangzhou, China
2
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
expressions will influence the faces tremendously, which also contribute to the complexity of this task. To handle this tough problem, some prior knowledge can be extracted from the data itself. In fact, the faces from the videos exist as a series of tracks. Faces from the same track must be the same person (track constraint), while faces from two different tracks appearing in the same frame must be different people (frame constraint). According to [1], face clustering with the above prior knowledge can be regarded as “self-supervised”. On the other hand, over the years, spectral clusteringbased subspace clustering methods have achieved r
Data Loading...