Learning spatial-temporally regularized complementary kernelized correlation filters for visual tracking

  • PDF / 3,940,032 Bytes
  • 18 Pages / 439.642 x 666.49 pts Page_size
  • 40 Downloads / 175 Views

DOWNLOAD

REPORT


Learning spatial-temporally regularized complementary kernelized correlation filters for visual tracking Zhenyang Su1,2 · Jing Li1

· Jun Chang1 · Chengfang Song1 · Yafu Xiao1 · Jun Wan1

Received: 7 April 2019 / Revised: 9 April 2020 / Accepted: 5 May 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Despite excellent performance shown by spatially regularized discriminative correlation filters (SRDCF) for visual tracking, some issues remain open that hinder further boosting their performance: first, SRDCF utilizes multiple training images to formulate its model, which makes it unable to exploit the circulant structure of the training samples in learning, leading to high computational burden; second, SRDCF is unable to efficiently exploit the powerfully discriminative nonlinear kernels, further negatively affecting its performance. In this paper, we present a novel spatial-temporally regularized complementary kernelized CFs (STRCKCF) based tracking approach. First, by introducing spatial-temporal regularization to the filter learning, the STRCKCF formulates its model with only one training image, which can not only facilitate exploiting the circulant structure in learning, but also reasonably approximate the SRDCF with multiple training images. Furthermore, by incorporating two types of kernels whose matrices are circulant, the STRCKCF is able to fully take advantage of the complementary traits of the color and HOG features to learn a robust target representation efficiently. Besides, our STRCKCF can be efficiently optimized via the alternating direction method of multipliers (ADMM). Extensive evaluations on OTB100 and VOT2016 visual tracking benchmarks demonstrate that the proposed method achieves favorable performance against state-of-the-art trackers with a speed of 40 fps on a single CPU. Compared with SRDCF, STRCKCF provides a 8× speedup and achieves a gain of 5.5% AUC score on OTB100 and 8.4% EAO score on VOT2016. Keywords Visual tracking · Spatial-temporal regularization · Correlation filter · Multi-kernel learning

 Jing Li

[email protected] 1

School of Computer Science, Wuhan University, Wuhan 430072, China

2

Department of Digital Media Technology, Huanggang Normal University, Huangzhou 438000, China

Multimedia Tools and Applications

1 Introduction Visual tracking is one of the most challenging tasks in computer vision with various applications such as intelligent video surveillance, video analysis, scene understanding, and so on [4, 23, 42]. In the past decades, much attention has been attracted on model-free tracking that initializes a bounding-box of an unknown target at the first frame. Generally, based on the different appearance models, these trackers can be categorized into generative and discriminative methods[5, 18, 53–57, 64]. Among them, the generative methods only exploit target information while the discriminative ones also consider rich information from backgrounds, thereby usually yielding much better performance than the generative ones [50]. T