Multi-level feature learning with attention for person re-identification

  • PDF / 1,925,463 Bytes
  • 15 Pages / 439.642 x 666.49 pts Page_size
  • 88 Downloads / 196 Views

DOWNLOAD

REPORT


Multi-level feature learning with attention for person re-identification Suncheng Xiang1

· Yuzhuo Fu1 · Hao Chen1 · Wei Ran1 · Ting Liu1

Received: 22 March 2020 / Revised: 10 July 2020 / Accepted: 6 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Person re-identification (re-ID) aims to match a specific person in a large gallery with different cameras and locations. Previous part-based methods mainly focus on part-level features with uniform partition, which increases learning ability for discriminative feature but not efficient or robust to scenarios with large variances. To address this problem, in this paper, we propose a novel feature fusion strategy based on traditional convolutional neural network. Then, a multi-branch deeper feature fusion network architecture is designed to perform discriminative learning for three semantically aligned region. Based on it, a novel self-attention mechanism is employed to softly assign corresponding weights to the semantic aligned feature during back-propagation. Comprehensive experiments have been conducted on several large-scale benchmark datasets, which demonstrates that proposed approach yields consistent and competitive re-ID accuracy compared with current single-domain re-ID methods. Keywords Re-identification · Multi-branch · Semantically aligned region · Self-attention

1 Introduction In the past few years, the computer vision community has achieved a significant progress in various applications. As a fundamental problem in video surveillance, person reidentification (re-ID) is a challenging task aiming at matching and returning a specified probe person from a large-scale gallery set collected by different camera in a different time, which is a very popular topic and has drawn increasing attention from both academia and industry due to its widespread applications in intelligent retrieval and public security. Encouraged by the remarkable success of deep learning methods and the availability of datasets, re-ID research community has witnessed a significant progress during the past few years [28]. However, performance of model in single-domain setting is restricted a lot by variations in pose, viewpoints, illumination [38] and occlusion, which hinder the  Suncheng Xiang

[email protected] 1

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China

Multimedia Tools and Applications

further improvement of person re-ID performance and make this problem non-trivial, so re-ID systems are still faced with a series of realistic and challenging difficulties. Indeed, some recent deep re-ID methods [12, 20, 22] have achieved breakthrough with satisfied performance by deep feature representation. For example, the rank-1 accuracy on DukeMTMC-reID [18] has been improved from 25.1% [32] to 80.5% [13], the rank1 accuracy of single query on Market-1501 [32] have been improved from 43.8% [14] to 91.2% [13]. In fact, these leaps in performance come only when a large diversity of