Hierarchical correlation siamese network for real-time object tracking

  • PDF / 1,855,848 Bytes
  • 10 Pages / 595.224 x 790.955 pts Page_size
  • 2 Downloads / 187 Views

DOWNLOAD

REPORT


Hierarchical correlation siamese network for real-time object tracking Yu Meng1,2

· Zaixu Deng1 · Kun Zhao1 · Yan Xu1 · Hao Liu1

Accepted: 9/29/2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Under the influence of deep learning, many trackers have emerged recently. Among them, Siamese network reaches a pleasant balance between accuracy and speed, but its tracking performance still lags behind other trackers. In this paper, we have proposed a Hierarchical Correlation Siamese Network (HC-Siam) for object tracking. The tracker uses convolutional features of each layer to compare the correlation and identifies the position of the tracking object depending on the greatest correlation. Meanwhile, we have designed a Correlation Attention Module (CA-Module). For various objects, this module can assign different weights to the hierarchical correlation and help the network choose the distinct correlation from the hierarchical correlation. Besides, objects’ size and scale constantly varied during tracking, we claimed to use the separate scale factor in the wide and high directions to decrease the deformation of bounding boxes and increase the accuracy of our tracker. On the OTB dataset, the accuracy of HC-Siam is 6.5% higher than the baseline, and the speed of our tracker can reach 85 fps. On the VOT dataset, HC-Siam also has better performance in speed and accuracy. Keywords Siamese network · Object tracking · Deep learning · Attention mechanism

1 Introduction Object tracking refers to selecting an object in the first frame and then computing the position and scale of the given object in each subsequent of the video. It is not only a hot issue in academics, but also widely used in human-computer interaction [1], automatic driving [2] and video surveillance [3]. Despite years of research, object tracking is prone to fail when some problems occur, such as deformation, occlusion, background clustering, scale variation [4] and so on. Besides, because visual object tracking is the processing of video, trackers only make sense when it can be implemented in real time. Therefore, how to retain the speed of a tracker is a difficult issue. In order to resolve the tracking issue, the research on visual object tracking started very early. There are various

 Yu Meng

[email protected] 1

School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China

2

Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing, China

trackers and some of them are very excellent. On the basis of features represented by the trackers, visual object tracking can be divided into two categories: one is based on Correlation Filter (CF) [5–11] and the other is based on Deep Learning (DL) [7, 12–14]. CF-based trackers make use of the hand-crafted features (e.g., Histogram of Oriented Gradients [15]) and Discrete Fourier Transform. They compare the correlation between objects and candidate regions, and the most relevant region is the tracking object. Although