Pixel Level Tracking of Multiple Targets in Crowded Environments

Tracking of multiple targets in a crowded environment using tracking by detection algorithms has been investigated thoroughly. Although these techniques are quite successful, they suffer from the loss of much detailed information about targets in detectio

  • PDF / 1,327,000 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 53 Downloads / 154 Views

DOWNLOAD

REPORT


Abstract. Tracking of multiple targets in a crowded environment using tracking by detection algorithms has been investigated thoroughly. Although these techniques are quite successful, they suffer from the loss of much detailed information about targets in detection boxes, which is highly desirable in many applications like activity recognition. To address this problem, we propose an approach that tracks superpixels instead of detection boxes in multi-view video sequences. Specifically, we first extract superpixels from detection boxes and then associate them within each detection box, over several views and time steps that lead to a combined segmentation, reconstruction, and tracking of superpixels. We construct a flow graph and incorporate both visual and geometric cues in a global optimization framework to minimize its cost. Hence, we simultaneously achieve segmentation, reconstruction and tracking of targets in video. Experimental results confirm that the proposed approach outperforms state-of-the-art techniques for tracking while achieving comparable results in segmentation.

Keywords: Superpixels Hypergraph

1

·

Segmentation

·

Reconstruction

·

Tracking

·

Introduction

Tracking of multiple targets in a crowded and unconstrained environment has many applications in video surveillance and security systems. This is a challenging problem due to the high amount of noise in the measured data, occlusion among targets, and interaction of targets with themselves or with other objects. Currently, tracking-by-detection is considered as the most successful solution for this problem [4,26,27,29,33]. However, tracking of detection boxes is not enough for many real applications such as human activity recognition and analysis. In this work, we propose an approach to track segmented targets instead of their corresponding detection boxes in multi-view video sequences. We extract superpixels from detection boxes in all images and associate them over different views and time steps. Association of several superpixels in a detection box results in a segmentation. Moreover, association of several segmentations from c Springer International Publishing Switzerland 2016  G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part II, LNCS 9914, pp. 692–708, 2016. DOI: 10.1007/978-3-319-48881-3 49

Pixel Level Tracking of Multiple Targets in Crowded Environments

693

different views results in a 3D reconstruction. Finally, association of segmentations or reconstructions over time (i.e., temporal association) results in tracking of segmented targets in video sequences. In other words, we address the problem of segmentation, reconstruction and tracking of multiple targets in multi-view video sequences. In contrast to previous works, we aim to assign a unique target ID not only to each individual detection, but to every superpixel in the entire multi-view video sequence. In common with some other approaches [14,18], the problem is first formulated as a maximum a-priori problem and then mapped into a constraint flow graph, which can be efficien