CDT: Cooperative Detection and Tracking for Tracing Multiple Objects in Video Sequences
A cooperative detection and model-free tracking algorithm, referred to as CDT, for multiple object tracking is proposed in this work. The proposed CDT algorithm has three components: object detector, forward tracker, and backward tracker. First, the objec
- PDF / 4,684,380 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 76 Downloads / 221 Views
Abstract. A cooperative detection and model-free tracking algorithm, referred to as CDT, for multiple object tracking is proposed in this work. The proposed CDT algorithm has three components: object detector, forward tracker, and backward tracker. First, the object detector detects targets with high confidence levels only to reduce spurious detection and achieve a high precision rate. Then, each detected target is traced by the forward tracker and then by the backward tracker to restore undetected states. In the tracking processes, the object detector cooperates with the trackers to handle appearing or disappearing targets and to refine inaccurate state estimates. With this detection guidance, the model-free tracking can trace multiple objects reliably and accurately. Experimental results show that the proposed CDT algorithm provides excellent performance on a recent benchmark. Furthermore, an online version of the proposed algorithm also excels in the benchmark. Keywords: Joint detection and tracking · Multiple object tracking Object detection · Model-free tracking · Online multi-object tracking
1
·
Introduction
The objective of multiple object tracking (MOT) is to estimate the states (or bounding boxes) of as many objects as possible in a video sequence and trace them temporally. Especially, tracking specific objects, such as pedestrians and cars, has drawn attention for its various applications, including surveillance systems and self-driving cars. For this purpose, many tracking-by-detection algorithms [1–19] have been proposed to yield promising performance. The trackingby-detection approach decomposes MOT into two subproblems: object detection and data association. It first detects objects in each frame and then links the detection results to form trajectories across frames. With the recent success of object detection techniques [20–23], this approach has several advantages over model-free tracking, which does not assume a specific object and instead traces the bounding box of an arbitrary object, manually annotated in the first frame. Specifically, the tracking-by-detection approach is more robust against object appearance variation and model drift, and it can identify emerging or disappearing objects in a video sequence more easily. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 851–867, 2016. DOI: 10.1007/978-3-319-46466-4 51
852
H.-U. Kim and C.-S. Kim
MOT, however, still remains a challenging problem in case of crowded or cluttered scenes. A complicated scene causes more detection failures, which are either undetected objects (false negatives) or spurious detection (false positives). The poor detection, in turn, decreases the accuracy of data association. To compensate for detection failures, many MOT algorithms [1–12,14] focus on the global data association. Given detection results in all frames, they design a cost function to formulate the data association as an optimization problem and then determine optimal trajectories by minimizing the cos
Data Loading...