MARS: A Video Benchmark for Large-Scale Person Re-Identification

This paper considers person re-identification (re-id) in videos. We introduce a new video re-id dataset, named Motion Analysis and Re-identification Set (MARS), a video extension of the Market-1501 dataset. To our knowledge, MARS is the largest video re-i

PDF / 2,922,155 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
82 Downloads / 301 Views

DOWNLOAD

REPORT

Tsinghua University, Beijing, China [email protected], [email protected] 2 Microsoft Research, Beijing, China 3 UTSA, San Antonio, USA 4 Peking University, Beijing, China

Abstract. This paper considers person re-identiﬁcation (re-id) in videos. We introduce a new video re-id dataset, named Motion Analysis and Re-identiﬁcation Set (MARS), a video extension of the Market1501 dataset. To our knowledge, MARS is the largest video re-id dataset to date. Containing 1,261 IDs and around 20,000 tracklets, it provides rich visual information compared to image-based datasets. Meanwhile, MARS reaches a step closer to practice. The tracklets are automatically generated by the Deformable Part Model (DPM) as pedestrian detector and the GMMCP tracker. A number of false detection/tracking results are also included as distractors which would exist predominantly in practical video databases. Extensive evaluation of the state-of-the-art methods including the space-time descriptors and CNN is presented. We show that CNN in classiﬁcation mode can be trained from scratch using the consecutive bounding boxes of each identity. The learned CNN embedding outperforms other competing methods considerably and has good generalization ability on other video re-id datasets upon ﬁne-tuning.

Keywords: Video person re-identiﬁcation

1

· Motion features · CNN

Introduction

Person re-identiﬁcation, as a promising way towards automatic VIDEO surveillance, has been mostly studied in pre-deﬁned IMAGE bounding boxes (bbox). Impressive progress has been observed with image-based re-id. However, rich information contained in video sequences (or tracklets) remains under-explored. In the generation of video database, pedestrian detectors [11] and oﬄine trackers [7] are readily available. So it is natural to extract tracklets instead of single (or multiple) bboxes. This paper, among a few contemporary works [25,29,36,38,41], makes initial attempts on video-based re-identiﬁcation. The dataset and codes are available at http://www.liangzheng.com.cn. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 868–884, 2016. DOI: 10.1007/978-3-319-46466-4 52

MARS: A Video Benchmark for Large-Scale Person Re-Identiﬁcation

869

With respect to the “probe-to-gallery” pattern, there are four re-id strategies: image-to-image, image-to-video, video-to-image, and video-to-video. Among them, the ﬁrst mode is mostly studied in literature, and previous methods in image-based re-id [5,24,35] are developed in adaptation to the poor amount of training data. The second mode can be viewed as a special case of “multi-shot”, and the third one involves multiple queries. Intuitively, the video-to-video pattern, which is our focus in this paper, is more favorable because both probe and gallery units contain much richer visual information than single images. Empirical evidences conﬁrm that the video-to-video strategy is superior to the others (Fig. 3). Currently, a few video re-id datasets exist [4,15,28,36]. They are limited

Data Loading...

MARS: A Video Benchmark for Large-Scale Person Re-Identification

Recommend Documents

Temporal Complementary Learning for Video Person Re-identification

Appearance-Preserving 3D Convolution for Video-Based Person Re-identification

Benchmark

OWL2Bench: A Benchmark for OWL 2 Reasoners

A benchmark city for seismic resilience assessment

URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark

A compass for sustainable person-centered governance

Fingerprint Benchmark

Mars 2020 Mission Overview

XACBench: a XACML policy benchmark

A heuristic and a benchmark for the stowage planning problem

Automatic Summarization Method for First-Person-View Video Based on Object Gaze Time