A divide-and-unite deep network for person re-identification

  • PDF / 2,990,474 Bytes
  • 13 Pages / 595.224 x 790.955 pts Page_size
  • 82 Downloads / 185 Views

DOWNLOAD

REPORT


A divide-and-unite deep network for person re-identification Rui Li1 · Baopeng Zhang1 · Zhu Teng1 · Jianping Fan2

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Person re-identification (person re-ID) is one of the most challenging tasks in the field of computer vision as it involves large variations in human appearances, human poses, background illuminations, camera views, etc. In recent literature, using part-level features for the person re-ID task provides fine-grained information, and has been proven to be effective. Instead of relying on additional skeleton key points or pose estimation models, this paper proposes a Divide-and-Unite Network to obtain feature embedding end-to-end. We design a deep network guided by image contents, which divides pedestrians into parts and obtains the part features with different contributions. These part features and the global feature are united to obtain the pedestrian descriptor for person re-ID. To summarize, the contributions of this work are two-fold. Firstly, a novel architecture of discriminative descriptor learning is proposed, which is based on the global feature and supplemented by part features. Secondly, a Feature Division Network is constructed to generate the part features with different contributions, where the divided parts maintain the consistency of content between different images. Extensive experiments are conducted on three widely-used benchmarks including Market1501, CUHK03, and DukeMTMC-reID. The results have demonstrated that the proposed model can achieve remarkable performance against numerous state-of-the-arts. Keywords Person re-identification · Siamese network · Global feature · Part feature

1 Introduction Due to the wide applications of person re-ID in video surveillance and intelligent security, it is an attractive task in the field of computer vision. Given a query, the purpose of person re-ID is to find the specific person from other camera views and these views are captured by the nonoverlapping surveillance deployed at different locations.

 Zhu Teng

[email protected] Rui Li [email protected] Baopeng Zhang [email protected] Jianping Fan [email protected] 1

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China

2

AI Lab, Lenovo Research, Beijing, China

It is a very challenging problem [2] due to diverse intrinsic factors and extrinsic impacts such as appearance variations, human pose changes, illumination disparity, camera views, information loss in data collection, occlusions, etc. With the development of deep learning, more and more researchers use the Convolutional Neural Network (CNN) for the person re-ID task, and these methods show great development potential. In the re-ID task, learning an effective feature representation is indispensable and crucial. Earlier researches extract global features on pedestrian images, and some works use different learning tasks [51] or attention mechanism [18] to get better performance. Recently, increasingly state-of-the-art meth