SiamMN: Siamese modulation network for visual object tracking

  • PDF / 1,749,403 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 63 Downloads / 237 Views

DOWNLOAD

REPORT


SiamMN: Siamese modulation network for visual object tracking Li-hua Fu 1

1

1

1

1

& Yu Ding & Yu-bin Du & Bo Zhang & Lu-yuan Wang & Dan Wang

1

Received: 10 January 2020 / Revised: 23 July 2020 / Accepted: 4 August 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Visual object tracking methods based on Siamese network are often difficult to distinguish objects with the same semantic or similar appearance as tracking target in tracking process due to the lack of discriminating strategies for the confusing objects. We propose a visual object tracking method based on Siamese modulation network. It takes the given bounding box in the target frame and the current frame as input, and fuses these multilayer convolutional features to obtain more target appearance information of bounding box and the current frame. The feature modulator generates feature modulation vector based on the given bounding box to enhance visual appearance information of target instance in multi-layer feature of the current frame, so as to make target instance obtain higher score in response map of region proposal network, and thus realize target instancespecific tracking task. Experiments on two public benchmark datasets, OTB2015 and VOT2018, show that the proposed tracker has a competitive performance among other state-of-the art trackers. Keywords Visual object tracking . Feature modulation . Siamese network . Region proposal network

1 Introduction Video Object Tracking (VOT) is a task of automatically detecting a specific target in video to obtain its position and trajectory information. Based on the specific bounding box given in the first frame, the same target bounding box is marked in the subsequent frames of video sequence, which is the semi-supervised single-object tracking. Single-object video tracking is a research hotspot in the field of computer vision, which has been widely used in traffic monitoring [20], scene analysis [32], crowd analysis [38], and motion recognition [37]. In

* Li-hua Fu [email protected]

1

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

Multimedia Tools and Applications

recent years, most researches on visual object tracking focus on two aspects. One is the correlation filter based trackers, the other is the Siamese network based trackers. For the correlation filter based trackers, the target in the first frame is mapped to Fourier domain, and the correlation filter template of target is calculated by using the two-dimensional Gaussian distribution [6]. Then, the maximum response in subsequent frame is obtained through the correlation filter template, and the response is taken as central point of tracking target to obtain final result [7]. The traditional visual object tracking frameworks mainly realize visual object tracking by matching the similarity of the appearance of objects between two adjacent frames [11]. The correlation filter trackers based on online update strategy can solve the problem of target deformation well