A Meta-Q-Learning Approach to Discriminative Correlation Filter based Visual Tracking

  • PDF / 2,810,444 Bytes
  • 11 Pages / 595.224 x 790.955 pts Page_size
  • 76 Downloads / 198 Views

DOWNLOAD

REPORT


A Meta-Q-Learning Approach to Discriminative Correlation Filter based Visual Tracking Akihiro Kubo1

· Kourosh Meshgi2 · Shin Ishii1

Received: 26 February 2020 / Accepted: 13 October 2020 © Springer Nature B.V. 2020

Abstract Visual object tracking remains a challenging computer vision problem with numerous real-world applications. Discriminative correlation filter (DCF)-based methods are a recent state-of-the-art approach for dealing with this problem. The learning rate when applying a DCF is typically fixed, regardless of the situation. However, this rate is important for robust tracking, insofar as real-world video sequences include a variety of dynamical changes, such as occlusions, motion blur, and deformations. In this study, we propose Meta-Q-learning Correlation Filter (MQCF), a method for dynamically determining the learning rate of a baseline DCF-based tracker based on hand-crafted features of Histogram of Oriented Gradient (HOG), by means of reinforcement learning. The incorporation of reinforcement learning enables us to train a function for an image patch that outputs a situation-dependent learning rate of the baseline tracker in an autonomous fashion. We evaluated this method using two open benchmarks, namely, OTB-2015 and VOT-2105, and found our MQCF tracker outperformed a baseline state-of-the-art tracker by 1.8% in Area Under Curve on OTB-2015, and 8.4% relative gain in Expected Average Overlap in the VOT-2015 challenge. Our results demonstrate the advantages of the so-called meta-learning with DCF-based visual object tracking. Keywords Visual tracking · Deep reinforcement learning · Meta-learning

1 Introduction Visual object tracking is an important but challenging task in the field of computer vision. In a visual tracking task, we specify a target at a certain frame of a video and determine its location in the successive frames. This task is important owing to its wide range of real-world applications, such as visual surveillance, robot navigation, and sports analysis [1]. Visual tracking still includes difficulty, however,

 Akihiro Kubo

[email protected] Kourosh Meshgi [email protected] Shin Ishii [email protected] 1

Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan

2

The RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

because such applications require online tracking in realtime, as well as high efficiency and performance. A discriminative correlation filter (DCF) has been used in visual object tracking tasks [2] and has proven to have high computational efficiency and accuracy. A filterbased method like DCF obtains a heatmap that finds the target pattern by applying a convolution operation within the frequency domain, which is computationally efficient. In the operation a correlation filter is constructed to discriminate foreground or background patterns. To adapt to changes in the foreground and/or background, the correlation filter is updated in an incremental manner. The DCF has been extended for use with k