Person Re-identification in Videos by Analyzing Spatio-temporal Tubes

  • PDF / 2,029,944 Bytes
  • 15 Pages / 439.642 x 666.49 pts Page_size
  • 79 Downloads / 223 Views

DOWNLOAD

REPORT


Person Re-identification in Videos by Analyzing Spatio-temporal Tubes Arif Ahmed Sekh1 Ig-Jae Kim3

· Debi Prosad Dogra2 · Heeseung Choi3 · Seungho Chae3 ·

Received: 22 October 2019 / Revised: 7 May 2020 / Accepted: 22 May 2020 / © The Author(s) 2020

Abstract Typical person re-identification frameworks search for k best matches in a gallery of images that are often collected in varying conditions. The gallery usually contains image sequences for video re-identification applications. However, such a process is time consuming as video re-identification involves carrying out the matching process multiple times. In this paper, we propose a new method that extracts spatio-temporal frame sequences or tubes of moving persons and performs the re-identification in quick time. Initially, we apply a binary classifier to remove noisy images from the input query tube. In the next step, we use a keypose detection-based query minimization technique. Finally, a hierarchical re-identification framework is proposed and used to rank the output tubes. Experiments with publicly available video re-identification datasets reveal that our framework is better than existing methods. It ranks the tubes with an average increase in the CMC accuracy of 6-8% across multiple datasets. Also, our method significantly reduces the number of false positives. A new video re-identification dataset, named Tube-based Re-identification Video Dataset (TRiViD), has been prepared with an aim to help the re-identification research community. Keywords Video-based Person Re-identification · Re-ranking · Person Re-identification

1 Introduction Person re-identification (Re-Id) is useful in various intelligent video surveillance applications. The process can be considered as image retrieval problem, where a query image of a person (probe) is given and we search the person in a set of images extracted from different cameras (gallery). The task is difficult for various reasons. Firstly, face-based [24] and body movement-based identification [2] cannot be used due to the variations in CCTV camera positions. Secondly, complex nature of similarity measure and pose matching makes it

 Arif Ahmed Sekh

[email protected]

Extended author information available on the last page of the article.

Multimedia Tools and Applications

harder. Recent advancement in object tracking [4] has opened up new possibilities. Video object trackers can be used to track people in real-time. These tracks containing humans can be passed to a ML framework to search for identification in other cameras. The query can be a single image [25] or multiple images [9]. Often multi-image query uses early fusion and generate an average query image [29]. The method thus consumes higher computational power as compared to single image-based methods. Video-based re-identification research is still evolving [6, 18]. Existing algorithms are sensitive to the query images or video segment. Choosing an improper image or video segment may lead to poor retrieval [25]. In this paper, we detect and track human