Hierarchical attentive Siamese network for real-time visual tracking

PDF / 1,213,675 Bytes
12 Pages / 595.276 x 790.866 pts Page_size
1 Downloads / 276 Views

(0123456789().,-volV)(0123456789(). ,- volV)

EXTREME LEARNING MACHINE AND DEEP LEARNING NETWORKS

Hierarchical attentive Siamese network for real-time visual tracking Kang Yang1 • Huihui Song1 • Kaihua Zhang1

•

Qingshan Liu1

Received: 17 December 2018 / Accepted: 9 May 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2019

Abstract Visual tracking is a fundamental and highly useful component in various tasks of computer vision. Recently, end-to-end off-line training Siamese networks have demonstrated great success in visual tracking with high performance in terms of speed and accuracy. However, Siamese trackers usually employ visual features from the last simple convolutional layers to represent the targets while ignoring the fact that features from different layers characterize different representation capabilities of the targets, and hence this may degrade tracking performance in the presence of severe deformation and occlusion. In this paper, we present a novel hierarchical attentive Siamese (HASiam) network for high-performance visual tracking, which exploits different kinds of attention mechanisms to effectively fuse a series of attentional features from different layers. More specifically, we combine a deeper network with a shallow one to take full advantage of the features from different layers and apply spatial and channel-wise attentions on different layers to better capture visual attentions on multi-level semantic abstractions, which is helpful to enhance the discriminative capacity of the model. Furthermore, the top-layer feature maps have low resolution that may affect localization accuracy if each feature is treated independently. To address this issue, a non-local attention module is also adopted on the top layer to force the network to pay more attention to the structural dependency of features at all locations during off-line training. The proposed HASiam is trained off-line in an end-to-end manner and needs no online updating the network parameters during tracking. Extensive evaluations demonstrate that our HASiam has achieved favorable results with AUC scores of 64:6%, 62:8% and EAO scores of 0.227 while having a speed of 60 fps on the OTB2013, OTB100 and VOT2017 real-time experiments, respectively. Our tracker with high accuracy and real-time speed can be applied to numerous vision applications like visual surveillance systems, robotics and augmented reality. Keywords Visual tracking Siamese networks Attention mechanism Hierarchical features

1 Introduction Online visual tracking is a fundamental yet challenging task in the field of computer vision, aiming to accurately localize an arbitrarily changing object in a video that is only specified with a bounding box at the first frame. Some classical tracking algorithms combine Kalman filtering [1] with optimization techniques to improve the tracking performance [2–5]. In the past decades, though a great progress has been made in visual tracking in terms of

& Kaihua Zhang [email protected] 1

Jiangsu Key Laboratory of Big

Data Loading...

Hierarchical attentive Siamese network for real-time visual tracking

Recommend Documents

Hierarchical correlation siamese network for real-time object tracking

SiamMN: Siamese modulation network for visual object tracking

Semi-supervised Visual Tracking Based on Variational Siamese Network

Multi-classifier Guided Discriminative Siamese Tracking Network

MHASiam: Mixed High-Order Attention Siamese Network for Real-Time Visual Tracking

Siamese network for real-time tracking with action-selection

Fully-Convolutional Siamese Networks for Object Tracking

Visual tracking with multilayer filter fusion network

Discriminative Context-Aware Correlation Filter Network for Visual Tracking

PG-Net: Pixel to Global Matching Network for Visual Tracking

Learning Attentive and Hierarchical Representations for 3D Shape Recognition

Optimisation of a Siamese Neural Network for Real-Time Energy Efficient Object Tracking