3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification

  • PDF / 2,178,600 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 67 Downloads / 198 Views

DOWNLOAD

REPORT


3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification Kyeong Tae Kim 1 & Bumshik Lee 2 & Jae Young Choi 1 Received: 16 February 2020 / Revised: 19 July 2020 / Accepted: 29 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

This paper proposes a novel video face identification method, named “3D-2D-DCNN cascade” that serially combines 3D and 2D deep convolutional neural networks (DCNNs) for robust video face recognition (FR). In our method, an input video (face) sequence is first divided into a number of sub-video sequences and each of the sub-video sequences is then used as an input to the 3D-DCNN, aiming to obtain a set of class-confidence scores for a given input video sequence. These class-confidence scores are aggregated in a novel way, resulting in the formation of our novel class-confidence matrix. Key characteristic of our method is to make use of this class-confidence matrix for fine-tuning 2D-DCNN, which is serially linked to 3D-DCNN, to obtain the final face identification results. To verify the proposed method, two popular video identification benchmarks, COX Face and YTC databases, were used. Compared to the best reported recognition results on these two benchmarks, our proposed method achieves better or comparable recognition performances. Keywords Video face identification . Deep convolutional neural network . 3D-2D-DCNN cascade . Class-confidence matrix

1 Introduction Video face identification has received a significant interest, due to a wide range of applications such as video surveillance, biometric identification, and content-based video indexing/search [28]. Recent trends in video face identification is the use of deep learning based methods, and

* Jae Young Choi [email protected]

1

Pattern Recognition and Machine Intelligence Laboratory, Division of Computer & Electronic Systems Engineering, Hankuk University of Foreign Studies, 81, Oedae-ro, Mohyeon-myeon, Cheoin-gu, Yongin-si, Gyeonggi-do 17305, Republic of Korea

2

Department of Information and Communications Engineering, Chosun University, 61452 Gwangju, Republic of Korea

Multimedia Tools and Applications

especially deep convolutional neural networks (DCNNs) show promising results [4, 24, 25, 31]. There have been a few studies on deep learning based video face identification. Here, several representative studies using deep learning on face identification are introduced. The authors in [17] presented a framework for detecting the image operator chain based on the Convolutional Neural Network (CNN) to determine whether an image has undergone a specific manipulation, and the sequence of manipulation. Authors in [31] proposed a Neural Aggregation Network (NAN) that generates a “fused” feature vector by combining multiple deep feature vectors, each computed for a particular video frame, using adaptive weights. However, if the weight of the high quality face image is too high, the effect of low quality face image is significantly suppressed, which leads to distort