Do We Really Need to Collect Millions of Faces for Effective Face Recognition?

Face recognition capabilities have recently made extraordinary leaps. Though this progress is at least partially due to ballooning training set sizes – huge numbers of face images downloaded and labeled for identity – it is not clear if the formidable tas

  • PDF / 1,689,363 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 80 Downloads / 237 Views

DOWNLOAD

REPORT


itute for Robotics and Intelligent Systems, USC, Los Angeles, CA, USA {iacopo.masi,anhttran,leksut,medioni}@usc.edu 2 Information Sciences Institute, USC, Los Angeles, CA, USA [email protected] 3 The Open University of Israel, Ra’anana, Israel

Abstract. Face recognition capabilities have recently made extraordinary leaps. Though this progress is at least partially due to ballooning training set sizes – huge numbers of face images downloaded and labeled for identity – it is not clear if the formidable task of collecting so many images is truly necessary. We propose a far more accessible means of increasing training data sizes for face recognition systems: Domain specific data augmentation. We describe novel methods of enriching an existing dataset with important facial appearance variations by manipulating the faces it contains. This synthesis is also used when matching query images represented by standard convolutional neural networks. The effect of training and testing with synthesized images is tested on the LFW and IJB-A (verification and identification) benchmarks and Janus CS2. The performances obtained by our approach match state of the art results reported by systems trained on millions of downloaded images.

1

Introduction

The recent impact of deep Convolutional Neural Network (CNN) based methods on machine face recognition capabilities has been extraordinary. The conditions under which faces are now recognized and the numbers of faces which systems can now learn to identify improved to the point where some consider machines to be better than humans at this task. This progress is partially due to the introduction of new and improved network designs. However, alongside developments in network architectures, it is also the underlying ability of CNNs to learn from massive training sets that allows these techniques to be so effective. Realizing that effective CNNs can be made even more effective by increasing their training data, many began focusing efforts on harvesting and labeling large image collections to better train their networks. In [39], a standard CNN was trained by Facebook using 4.4 million labeled faces and shown to achieve what was, at the time, state of the art performance on the Labeled Faces in the I. Masi, A. Tu´ an Tr` ˆ an and T. Hassner are equally contributed. ˆ c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part V, LNCS 9909, pp. 579–596, 2016. DOI: 10.1007/978-3-319-46454-1 35

I. Masi et al.

Dataset CASIA [46] Facebook DeepFace [39] Google FaceNet [33] VGG Face [28] Facebook Fusion [40] MegaFace [14] Aug. pose+shape Aug. pose+shape+expr

#ID 10,575 4,030 8M 2,622 500M 690,572 10,575 10,575

#Img #Img/#ID 494,414 46 4.4M 1K 200M 25 2.6M 1K 10M 50 1.02M 1.5 1,977,656 187 2,472,070 234

(a) Face set statistics

6000

Images

580

CASIA WebFace Pose with Shapes Pose, Shapes, Expression

4000

2000

0 0 10

10

1

2

3

10 10 Subjects (log scale)

10

4

5

10

(b) Images for subjects

Fig. 1. (a) Comparison of our augmented dataset with other face datasets along