A deep learning framework for face verification without alignment

  • PDF / 2,040,279 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 16 Downloads / 252 Views

DOWNLOAD

REPORT


SPECIAL ISSUE PAPER

A deep learning framework for face verification without alignment Zhongkui Fan1   · Ye‑peng Guan1,2 Received: 2 July 2020 / Accepted: 9 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Most of the CNN (convolutional neural networks) methods require alignment, which will affect the efficiency of verification. This paper proposes a deep face verification framework without alignment. First and foremost, the framework consists of two training stages and one testing stage. In the first training stage, the CNN is fully trained on the large face dataset. In the second training stage, embedding triplet is adopted to fine-tune the models. Furthermore, in the testing stage, SIFT (scale invariant feature transform) descriptors are extracted from intermediate pooling results for cascading verification, which effectively improves the accuracy of face verification without alignment. Last but not least, two CNN architectures are designed for different scenarios. The CNN1 (convolutional neural networks 1), with fewer layers and parameters, requires a small amount of memory and computation in training and testing, so it is suitable for real-time system. The CNN2 (convolutional neural networks 2), with more layers and parameters, has excellent face verification. Through the long-term training on WEB-face dataset and experiments on the LFW (labled faces in the wild), YTB (YouTube) datasets, the results show that the proposed method has superior performance compared with some state-of-the-art methods. Keywords  Convolutional neural networks · Face verification · Without alignment · Triplet loss

1 Introduction At present, many CNN (convolutional neural networks) methods have been used for image classification [1–3], object detection [4, 5], face verification. With the application of CNN-based model, the verification accuracy on the challenging LFW (labeled faces in the wild) benchmark has been improved from 97 to 99% [6, 7, 10–12]. Method [7] extracts feature vectors from CNN, then inputs these vectors into Bayesian and Gaussian processing for face verification [14, 15], which may not work in some situations. Method [13] adopts face alignment and multi-patched ensemble to enhance the robustness of face verification, which is time consuming. A loss of matching and non-matching pairs is proposed by [12, 16, 17], which has the potential to overcome the network bottleneck based on multi-class classification, but these methods may * Ye‑peng Guan [email protected] 1



School of Communication and Information Engineering, Shanghai University, Shanghai, China



Key Laboratory of Advanced Displays and System Application, Ministry of Education, Shanghai, China

2

not be generalized for a new identity that does not exist in the training set, and the threshold in the verification loss is determined manually [9–12]. Use one deep CNN model for identification and verification. Multitask learning provides an effective method to enhance the generalization of face representation. However, conv