Gated Siamese Convolutional Neural Network Architecture for Human Re-identification

Matching pedestrians across multiple camera views, known as human re-identification, is a challenging research problem that has numerous applications in visual surveillance. With the resurgence of Convolutional Neural Networks (CNNs), several end-to-end d

  • PDF / 1,495,241 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 89 Downloads / 250 Views

DOWNLOAD

REPORT


Abstract. Matching pedestrians across multiple camera views, known as human re-identification, is a challenging research problem that has numerous applications in visual surveillance. With the resurgence of Convolutional Neural Networks (CNNs), several end-to-end deep Siamese CNN architectures have been proposed for human re-identification with the objective of projecting the images of similar pairs (i.e. same identity) to be closer to each other and those of dissimilar pairs to be distant from each other. However, current networks extract fixed representations for each image regardless of other images which are paired with it and the comparison with other images is done only at the final level. In this setting, the network is at risk of failing to extract finer local patterns that may be essential to distinguish positive pairs from hard negative pairs. In this paper, we propose a gating function to selectively emphasize such fine common local patterns by comparing the mid-level features across pairs of images. This produces flexible representations for the same image according to the images they are paired with. We conduct experiments on the CUHK03, Market-1501 and VIPeR datasets and demonstrate improved performance compared to a baseline Siamese CNN architecture. Keywords: Human re-identification · Siamese Convolutional Neural Network · Gating function · Matching gate · Deep Convolutional Neural Networks

1

Introduction

Matching pedestrians across multiple camera views, also known as human reidentification, is a research problem that has numerous potential applications in visual surveillance. The goal of the human re-identification system is to retrieve a set of images captured by different cameras (gallery set) for a given query image (probe set) from a certain camera. Human re-identification is a very challenging task due to the variations in illumination, pose and visual appearance Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46484-8 48) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 791–808, 2016. DOI: 10.1007/978-3-319-46484-8 48

792

R.R. Varior et al. Query

Rank 1

(a)

Rank 2

Rank 3

Correct Match

Query

Rank 1

(b)

Rank 2

Rank 3

Correct Match

Fig. 1. Example case: Results obtained using a S-CNN. Red, Blue and Yellow boxes indicate some sample corresponding patches extracted from the images along the same horizontal row. See text for more details. Best viewed in color (Color figure online)

across different camera views. With the resurgence of Convolutional Neural Networks (CNNs), several deep learning methods [1,21,49] were proposed for human re-identification. Most of the frameworks are designed in a siamese fashion that integrates the tasks of feature extraction and metric learning into a single framework. The central idea behind a Siamese Convolutional Neural Network (S-CNN) is to learn an embedding where similar pairs (i.e. im