Anti-distractors: two-branch siamese tracker with both static and dynamic filters for object tracking

  • PDF / 4,349,493 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 7 Downloads / 189 Views

DOWNLOAD

REPORT


REGULAR PAPER

Anti‑distractors: two‑branch siamese tracker with both static and dynamic filters for object tracking Hao Shen1 · Defu Lin1 · Tao Song1 · Guangyu Gao2  Received: 7 November 2019 / Accepted: 20 June 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Visual Object Tracking is a very challenging task because of the large appearance variance caused by illumination, deformation, and motion. Siamese network-based trackers, which select target through a matching function, are widely used for visual object tracking. The trackers are capable of robustly recognizing the target with appearance variance. However, while the filter template is a crucial part of such methods, most of them did not update the filter template effectively, and have shown limited discriminative ability between target and similar semantic objects (distractors). In order to tackle the challenge of distractors, we added a dynamic filter branch on the traditional siamese network. Under the condition that multipeaks are detected on the static response map, the tracker will redetect target with dynamic branch and the final target location will be determined by the combined result of the dynamic filter branch and static filter branch. Subsequently the sample library with hard negative mining strategy is updated and the dynamic filter kernel is restrained online. With the fusion of two branches, the tracker can distinguish the true target from similar objects. Meanwhile, we conduct extensive experiments and empirical evaluations on two popular datasets: Visdrone and UAV123. Our tracker achieves an AUC of 58% on Visdrone dataset and an AUC of 60.7% on UAV123 dataset. Keywords  Siamese tracker · Distractor · Dynamic filter

1 Introduction Visual object tracking is a task that locates target objects precisely over a sequence of image frames given a certain target bounding box at the initial. For accomplishing this task, the tracker should be capable of: (1) recognizing the target from background clutter and other categories of objects, namely inter-class discrimination, (2) discriminating a particular target among similar distractors that may Communicated by I. IDE. Electronic supplementary material  The online version of this article (https​://doi.org/10.1007/s0053​0-020-00670​-9) contains supplementary material, which is available to authorized users. * Guangyu Gao [email protected] Hao Shen [email protected] 1



Beijing Key Laboratory of UAV Autonomous Control, Beijing Institute of Technology, Beijing, China



Beijing Institute of Technology, Beijing, China

2

be of the same category, namely intra-class discrimination, (3) continuously identifying the target when the features of target changes, such as illumination change, rotation, and deformation, namely intra-class consistency. However, (2) and (3) are inherently contradictory. For achieving intraclass consistency, the tracker should catch the long-term features of object in the whole sequence and neglect the variation of instantaneous features at different