Learning sequence-to-sequence affinity metric for near-online multi-object tracking

  • PDF / 1,951,886 Bytes
  • 20 Pages / 439.37 x 666.142 pts Page_size
  • 57 Downloads / 174 Views

DOWNLOAD

REPORT


Learning sequence-to-sequence affinity metric for near-online multi-object tracking Weijiang Feng1

· Long Lan1 · Xiang Zhang1 · Zhigang Luo1

Received: 11 June 2019 / Accepted: 2 July 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract In this paper, we propose a sequence-to-sequence affinity metric for the data association of near-online multi-object tracking. The proposed metric learns the affinity between track sequence consisting of the already associated detections and hypothesis sequence consisting of detections in the near future. With the potential hypothesis sequences, we leverage the idea that if a track sequence has a high affinity for a hypothesis sequence, and the hypothesis sequence also shares a close affinity for a current detection, then the affinity between the track sequence and the detection is high. By using the short hypothesis sequence as a “bridge”, the proposed sequence-to-sequence affinity metric enhances the conventional track sequence to detection affinity metric and improves its robustness to object occlusion and missing. Besides, in order to eliminate the negative effects of false alarms, we propose a false alarm model using both appearance and scale features of detection. The robustness of the proposed affinity metric allows us to use a simple greedy data association algorithm. Experimental results on the challenging MOT16 and MOT17 benchmarks demonstrate the effectiveness of our method. Keywords Multi-object tracking · Sequence-to-sequence · MOT Challenge

1 Introduction Multiple object tracking (MOT) aims to estimate the locations of all targets from categories of interest in a scene and maintain their identities consistently in the form of individual trajectory for each target. MOT in videos is an important problem in computer vision and has

B

Weijiang Feng [email protected] Long Lan [email protected] Xiang Zhang [email protected] Zhigang Luo [email protected]

1

National University of Defense Technology, Changsha, China

123

W. Feng et al.

Fig. 1 Demonstration of utilizing the near-future frames for affinity measure. Two pedestrians interact at the currently to be associated frame t. Affinity measure between T 1 and d can be enhanced by affinity measure between D and d, and sequence-to-sequence affinity measure between T 1 and D

various applications such as video surveillance, human–computer interface, and autonomous driving. Thanks to the recent advances in object recognition [28,29], the currently predominant approaches of MOT use the framework of tracking-by-detection, where an external detector provides detection responses of the objects of interest in the form of bounding boxes at each frame, and then corresponding bounding boxes across frames are associated to form a complete trajectory for each identity. However, despite much progress has been made through this data association formation, MOT is still a challenging task in the presence of frequent occlusions and interactions among targets, missing or inaccurate detections, a