Visual tracking with multilayer filter fusion network

  • PDF / 1,970,996 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 101 Downloads / 244 Views

DOWNLOAD

REPORT


Visual tracking with multilayer filter fusion network Wei Quan 1

2

3

3

3

& Tianrui Li & Ning Zhou & Dong Zou & Weihua Zhang & Jim X. Chen

4

Received: 11 December 2019 / Revised: 13 July 2020 / Accepted: 9 September 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

We propose the multilayer filter fusion network (MFFN) to address the problem of visual object tracking. In MFFN, the convolutional neural network (CNN) is used to extract the multilayer spatial features and then the convolutional long short-term memory (LSTM) to extract the temporal features of images. The object image centered at the target is cropped and fed into MFFN to obtain the correlation filter and the feature map to discriminate the target from background. The correlation filter is convolved with the corresponding feature map for the same layer to produce the probability map, which is then used to estimate the target position by searching its maximum value. The correlation filter corresponds to the tracked object image that is fed into MFFN and thus contains the appearance changes of target. In our multilayer filter fusion tracking (MFFT) framework, we use two MFFNs with different inputs to track the target via coarse-to-fine location approach. The first one is used to estimate the target position from the entire image and the second one to locate the target from the estimated target position. After the networks are trained off-line they do not require online learning during tracking. Experimental results on the CVPR2013 benchmark demonstrate that our tracking algorithm achieves competitive results compared with other tracking methods. Keywords Visual tracking . Multilayer filter fusion network . Correlation filter

* Wei Quan [email protected]

1

School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031 Sichuan, China

2

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031 Sichuan, China

3

National Key Laboratory of Traction Power, Southwest Jiaotong University, Chengdu 610031 Sichuan, China

4

Department of Computer Science, George Mason University, Fairfax, VA 22030, USA

Multimedia Tools and Applications

1 Introduction Visual object tracking is a fundamental problem of computer vision. Its task is to locate the object in the field of view [40, 42]. The classical method for object tracking is to maintain a classifier or detector that is learned offline or online to distinguish the object from its background [1, 17, 20, 28, 37, 39]. Compared with object detection and recognition, object tracking requires higher real-time performance, which means that a tracker that tracks the target accurately but slowly is not satisfactory, especially for many real-world applications. In recent years, most attention of video object tracking has been invested into the tracking methods based on correlation filter (CF) [6, 10, 12, 16, 25, 35, 36, 44] and the convolutional neural network (CNN) with deep feature representation [3, 38]. The CFbased methods