Transferring and Compressing Convolutional Neural Networks for Face Representations

In this work we have investigated face verification based on deep representations from Convolutional Neural Networks (CNNs) to find an accurate and compact face descriptor trained only on a restricted amount of face image data. Transfer learning by fine-t

  • PDF / 277,371 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 69 Downloads / 188 Views

DOWNLOAD

REPORT


entre for Mathematical Sciences, Lund University, Lund, Sweden [email protected], [email protected] 2 Axis Communications, Lund, Sweden {jiandan.chen,martin.ljungqvist}@axis.com

Abstract. In this work we have investigated face verification based on deep representations from Convolutional Neural Networks (CNNs) to find an accurate and compact face descriptor trained only on a restricted amount of face image data. Transfer learning by fine-tuning CNNs pretrained on large-scale object recognition has been shown to be a suitable approach to counter a limited amount of target domain data. Using model compression we reduced the model complexity without significant loss in accuracy and made the feature extraction more feasible for realtime use and deployment on embedded systems and mobile devices. The compression resulted in a 9-fold reduction in number of parameters and a 5-fold speed-up in the average feature extraction time running on a desktop CPU. With continued training of the compressed model using a Siamese Network setup, it outperformed the larger model.

1

Introduction

In visual recognition it is rapidly becoming a standard practice to use deep representations composed of layer activations extracted from Convolutional Neural Networks (CNNs) as object descriptors, see [1,17]. CNNs are frequent top performers on complex image analysis tasks. However, one of the drawbacks of CNNs is that they require vast amounts of data for training in order to perform well. The CNNs used for this purpose are therefore often pre-trained on huge labeled datasets for generic object recognition containing a large set of object categories, from here on we call those CNNs generic CNNs. Generic CNNs, such as [13,19], can be regarded as general-purpose feature extractors producing generic object descriptors, descriptors that may also constitute good representations for domains other than the source domain. Even though a generic CNN usually perform well in domains other than those it was trained for, it still lacks specificity. In many cases the object representations can be further improved by adapting the CNN to the target domain, as done in [1] and which led to state of the art results on 16 visual recognition benchmarks. c Springer International Publishing Switzerland 2016  A. Campilho and F. Karray (Eds.): ICIAR 2016, LNCS 9730, pp. 20–29, 2016. DOI: 10.1007/978-3-319-41501-7 3

Transferring and Compressing Convolutional Neural Networks

21

The process of transferring a generic CNN to a new data domain is often called fine-tuning and is a way to do transfer learning. Fine-tuning involves training a CNN structure initialized with weights from the pre-trained generic CNN and using data from the target domain. To recognise subjects in images of arbitrary angle, position, lighting and other variables is a complex task which requires large CNNs with many layers for training. To evaluate a trained CNN model on unseen data the entire CNN structure is needed. This is much more time efficient than training. However, a real-time application