Real-time tracking based on deep feature fusion

  • PDF / 3,483,914 Bytes
  • 27 Pages / 439.642 x 666.49 pts Page_size
  • 77 Downloads / 213 Views

DOWNLOAD

REPORT


Real-time tracking based on deep feature fusion Yuhang Pang1 · Fan Li1 · Xiaoya Qiao1 · Andrew Gilman2 Received: 28 September 2019 / Revised: 18 June 2020 / Accepted: 24 June 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Deep learning-based methods have recently attracted significant attention in visual tracking community, leading to an increase in state-of-the-art tracking performance. However, due to the utilization of more complex models, it has also been accompanied with a decrease in speed. For real-time tracking applications, a careful balance of performance and speed is required. We propose a real-time tracking method based on deep feature fusion, which combines deep learning with kernel correlation filter. First, hierarchical features are extracted from a lightweight pre-trained convolutional neural network. Then, original features of different levels are fused using canonical correlation analysis. Fused features, as well as some original deep features, are used in three kernel correlation filters to track the target. An adaptive update strategy, based on dispersion analysis of response maps for the correlation filters, is proposed to improve robustness to target appearance changes. Different update frequencies are adopted for the three filters to adapt to severe appearance changes. We perform extensive experiments on two benchmarks: OTB-50 and OTB-100. Quantitative and qualitative evaluations show that the proposed tracking method performs favorably against some state-of-the-art methods – even better than algorithms using complex network model. Furthermore, proposed algorithm runs faster than 20 frame per second (FPS) and hence able to achieve near real-time tracking. Keywords Visual tracking · Convolutional neural network · Feature fusion · Correlation filters  Fan Li

[email protected] Yuhang Pang [email protected] Xiaoya Qiao [email protected] Andrew Gilman [email protected] 1

School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China

2

Institute of Natural and Mathematical Sciences, Massey University, Auckland, New Zealand

Multimedia Tools and Applications

1 Introduction Visual tracking has seen a rapid rise in its popularity as a tool in computer vision and multimedia fields. With recent advancements in visual tracking technology, it is becoming increasingly used in video surveillance, intelligent transportation, public safety and military applications. Improving accuracy and capacity for real-time tracking has always been key goals for the visual tracking task. Recently, deep learning has been widely used in visual tracking and other multimedia processing tasks, such as image representation, object detection and action recognition [45, 51]. Deep learning-based visual tracking methods can be classified into two categories, according to the application of the deep network model: tracking based on end-to-end training architecture and tracking based on deep feature extraction architecture.