Person Re-identification via Recurrent Feature Aggregation

We address the person re-identification problem by effectively exploiting a globally discriminative feature representation from a sequence of tracked human regions/patches. This is in contrast to previous person re-id works, which rely on either single fr

  • PDF / 1,377,778 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 106 Downloads / 216 Views

DOWNLOAD

REPORT


Shanghai Jiao Tong University, Shanghai, China {yanyichao,nibingbing,5110309394,chaoma,xkyang}@sjtu.edu.cn 2 University of Michigan, Ann Arbor, USA [email protected]

Abstract. We address the person re-identification problem by effectively exploiting a globally discriminative feature representation from a sequence of tracked human regions/patches. This is in contrast to previous person re-id works, which rely on either single frame based person to person patch matching, or graph based sequence to sequence matching. We show that a progressive/sequential fusion framework based on long short term memory (LSTM) network aggregates the frame-wise human region representation at each time stamp and yields a sequence level human feature representation. Since LSTM nodes can remember and propagate previously accumulated good features and forget newly input inferior ones, even with simple hand-crafted features, the proposed recurrent feature aggregation network (RFA-Net) is effective in generating highly discriminative sequence level human representations. Extensive experimental results on two person re-identification benchmarks demonstrate that the proposed method performs favorably against state-of-theart person re-identification methods. Keywords: Person re-identification memory networks

1

· Feature fusion · Long short term

Introduction

Person re-identification (re-id) deals with the problem of re-associating a specific person across non-overlapping cameras. It has been receiving increasing popularity [1] due to its important applications in intelligent video surveillance. Existing methods mainly focus on addressing the single-shot person re-id problem. Given a probe image of one person taken from one camera, a typical scenario for single-shot person re-id is to identify this person in a set of gallery images taken from another camera. Usually, the identification results are based on ranking the similarities of the probe-gallery pairs. The performance of person re-id is measured by the rank-k matching rate if the correct pair hits the retrieved top-k ranking list. To increase the matching rate, state-ofthe-art approaches either employ discriminative features in representing persons c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 701–716, 2016. DOI: 10.1007/978-3-319-46466-4 42

702

Y. Yan et al.

or apply distance metric learning methods to increase the similarity between matched image pairs. Numerous types of features have been explored to represent persons, including global features like color and texture histograms [2,3], local features such as SIFT [4] and LBP [5], and deep convolutional neural network (CNN) features [6,7]. In the meantime, a large number of metric learning approaches have been applied to person re-id task, such as LMNN [8], Mahalanobis distance metric [9], and RankSVM [10]. Despite the significant progress in recent years, the performance achieved by these methods do not fulfill the realapplication requirement due to the following reasons. First,