Deep Attributes Driven Multi-camera Person Re-identification

The visual appearance of a person is easily affected by many factors like pose variations, viewpoint changes and camera parameter differences. This makes person Re-Identification (ReID) among multiple cameras a very challenging task. This work is motivate

  • PDF / 1,394,577 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 19 Downloads / 208 Views

DOWNLOAD

REPORT


3

Peking University, Beijing, China {chisu,slzhang.jdl,wgao}@pku.edu.cn 2 Chinese Academy of Sciences, Beijing, China [email protected] Department of Computer Science, University of Texas at San Antonio, San Antonio, USA [email protected]

Abstract. The visual appearance of a person is easily affected by many factors like pose variations, viewpoint changes and camera parameter differences. This makes person Re-Identification (ReID) among multiple cameras a very challenging task. This work is motivated to learn mid-level human attributes which are robust to such visual appearance variations. And we propose a semi-supervised attribute learning framework which progressively boosts the accuracy of attributes only using a limited number of labeled data. Specifically, this framework involves a three-stage training. A deep Convolutional Neural Network (dCNN) is first trained on an independent dataset labeled with attributes. Then it is fine-tuned on another dataset only labeled with person IDs using our defined triplet loss. Finally, the updated dCNN predicts attribute labels for the target dataset, which is combined with the independent dataset for the final round of fine-tuning. The predicted attributes, namely deep attributes exhibit superior generalization ability across different datasets. By directly using the deep attributes with simple Cosine distance, we have obtained surprisingly good accuracy on four person ReID datasets. Experiments also show that a simple distance metric learning modular further boosts our method, making it significantly outperform many recent works.

Keywords: Deep attributes

1

· Re-identification

Introduction

Person Re-Identification (ReID) targets to identify the same person from different cameras, datasets, or time stamps. As illustrated in Fig. 1, factors like viewpoint variations, illumination conditions, camera parameter differences, as well as body pose changes make person ReID a very challenging task. Due to its important applications in public security, e.g., cross camera pedestrian searching, tracking, and event detection, person ReID has attracted lots of attention c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part II, LNCS 9906, pp. 475–491, 2016. DOI: 10.1007/978-3-319-46475-6 30

476

C. Su et al.

from both the academic and industrial communities. Currently, research on this topic mainly focus on two aspects: (a) extracting and coding local invariant features to represent the visual appearance of a person [1–7] and (b) learning a discriminative distance metric hence the distance of features from the same person can be smaller [8–25]. Although significant progress has been made from previous studies, person ReID methods are still not mature enough for real applications. Local features mostly describe the low-level visual appearance, hence are not robust to variances of viewpoints, body poses, etc. On the other side, distance metric learning suffers from the poor generalization ability and the quadratic computational complexity, e.g., different datasets