Discriminative feature extraction for video person re-identification via multi-task network

  • PDF / 2,427,164 Bytes
  • 16 Pages / 595.224 x 790.955 pts Page_size
  • 27 Downloads / 172 Views

DOWNLOAD

REPORT


Discriminative feature extraction for video person re-identification via multi-task network Wanru Song1

· Jieying Zheng1 · Yahong Wu1 · Changhong Chen1 · Feng Liu1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The goal of video-based person re-identification is to match different pedestrians in various image sequences across nonoverlapping cameras. A critical issue of this task is how to exploit the useful information provided by videos. To solve this problem, we propose a novel feature learning framework for video-based person re-identification. The proposed method aims at capturing the most significant information in the spatial and temporal domains and then building a discriminative and robust feature representation for each sequence. More specifically, to learn more effective frame-wise features, we apply several attributes to the video-based task and build a multi-task network for the identity and attribute classifications. In the training phase, we present a multi-loss function to minimize intra-class variances and maximize inter-class differences. After that, the feature aggregation network is employed to aggregate frame-wise features and extract the temporal information from the video. Furthermore, considering that adjacent frames typically have similar appearance features, we propose the concept of “non-redundant appearance feature extraction” to obtain the sequence-level appearance descriptors of pedestrians. Based on the complementarity between the temporal feature and the non-redundant appearance feature, we combine them in the distance learning phase by assigning them different distance-weighted coefficients. Extensive experiments are conducted on three video-based datasets and the results demonstrate the superiority and effectiveness of our method. Keywords Attribute · Center loss · Feature representation · Person re-identification · Video

1 Introduction The research of person re-identification (person re-id) aims to identify the same person across non-overlapping cameras. Recently, due to the development of video surveillance applications, person re-id has drawn more and more attention [9, 10, 24, 34, 41, 49]. Considering that there are serious occlusions, misalignments, and variations of pose,

This work was supported in part by National Natural Science Foundation of China under Grant 61702278, in part by Priority Academic Program Development of Jiangsu Higher Education Institutions and in part by Postgraduate Research & Practice Innovation Program of Jiangsu Province KYCX18 0890.  Feng Liu

[email protected] Wanru Song [email protected] 1

Nanjing University of Posts and Telecommunications, No.66 Xin Mofan RD, Nanjing 210003, China

viewpoint and illumination in most application situations, the re-id remains a challenging task. The studies of re-id can be mainly divided into feature learning and distance metric learning. The goal of feature learning is to describe pedestrians more effectively. Earlier research is committed to exploiting low-level information and