Hierarchical attentive Siamese network for real-time visual tracking
- PDF / 1,213,675 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 1 Downloads / 208 Views
(0123456789().,-volV)(0123456789(). ,- volV)
EXTREME LEARNING MACHINE AND DEEP LEARNING NETWORKS
Hierarchical attentive Siamese network for real-time visual tracking Kang Yang1 • Huihui Song1 • Kaihua Zhang1
•
Qingshan Liu1
Received: 17 December 2018 / Accepted: 9 May 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2019
Abstract Visual tracking is a fundamental and highly useful component in various tasks of computer vision. Recently, end-to-end off-line training Siamese networks have demonstrated great success in visual tracking with high performance in terms of speed and accuracy. However, Siamese trackers usually employ visual features from the last simple convolutional layers to represent the targets while ignoring the fact that features from different layers characterize different representation capabilities of the targets, and hence this may degrade tracking performance in the presence of severe deformation and occlusion. In this paper, we present a novel hierarchical attentive Siamese (HASiam) network for high-performance visual tracking, which exploits different kinds of attention mechanisms to effectively fuse a series of attentional features from different layers. More specifically, we combine a deeper network with a shallow one to take full advantage of the features from different layers and apply spatial and channel-wise attentions on different layers to better capture visual attentions on multi-level semantic abstractions, which is helpful to enhance the discriminative capacity of the model. Furthermore, the top-layer feature maps have low resolution that may affect localization accuracy if each feature is treated independently. To address this issue, a non-local attention module is also adopted on the top layer to force the network to pay more attention to the structural dependency of features at all locations during off-line training. The proposed HASiam is trained off-line in an end-to-end manner and needs no online updating the network parameters during tracking. Extensive evaluations demonstrate that our HASiam has achieved favorable results with AUC scores of 64:6%, 62:8% and EAO scores of 0.227 while having a speed of 60 fps on the OTB2013, OTB100 and VOT2017 real-time experiments, respectively. Our tracker with high accuracy and real-time speed can be applied to numerous vision applications like visual surveillance systems, robotics and augmented reality. Keywords Visual tracking Siamese networks Attention mechanism Hierarchical features
1 Introduction Online visual tracking is a fundamental yet challenging task in the field of computer vision, aiming to accurately localize an arbitrarily changing object in a video that is only specified with a bounding box at the first frame. Some classical tracking algorithms combine Kalman filtering [1] with optimization techniques to improve the tracking performance [2–5]. In the past decades, though a great progress has been made in visual tracking in terms of
& Kaihua Zhang [email protected] 1
Jiangsu Key Laboratory of Big
Data Loading...