A strong feature representation for siamese network tracker

PDF / 2,232,556 Bytes
15 Pages / 439.642 x 666.49 pts Page_size
79 Downloads / 199 Views

A strong feature representation for siamese network tracker Zhipeng Zhou1,2 · Rui Zhang1,2 · Dong Yin1,2 Received: 15 August 2019 / Revised: 1 June 2020 / Accepted: 4 June 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Because AlexNet is too shallow to form a strong feature representation, the trackers based on the Siamese network have an accuracy gap comparing with state-of-the-art algorithms. Both deep features and appearance features benefit tracking accuracy. To combine these two kinds features, the modified pre-trained VGG16 network is fine-tuned as one branch of the backbone network. Secondly, an AlexNet branch is attached after the third convolutional layer of VGG16. Thus the response maps from both branches are merged to form a preliminary strong feature representation with deep features and shallow appearance features. Thirdly, a new mean Peak-to-side ratio(mPSR) loss is designed to help network learn target features adaptively. A channel attention block and the Average-Peak-to-Correlation Energy(APCE) are designed to help select contributed features and suppress distractors. SiamPF only takes ILSVRC2015-VID as training dataset, but it achieves excellent performance on OTB-2013 / OTB-2015 / VOT2015 / VOT2016 / VOT2017 while maintaining the real-time performance of 41FPS on the GTX 1080Ti. Keywords Siamese network · Feature representation · mPSR

1 Introduction Visual tracking is a fundamental topic in computer vision. It can be divided into two subtopics base on target: single object tracking and multiple object tracking [31]. Many single object tracking methods have been studied in recent years. They are mainly based on either correlation filter framework or deep learning framework. Correlation filter was introduced to computer vision by David S. Bolme [3] who proposed a tracker named MOSSE based on correlation filter. Henriques J.F proposed a method called CSK [19], which developed the intensive sampling and the kernel trick based on MOSSE. Furthermore, he exploited multi-channel HOG feature into KCF [20], which was an enhanced vision of CSK. Similarly, Danelljan M [5] developed CSK with multi-channel color names(CN) feature. Due to Dong Yin

[email protected] 1

School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China

2

Key Laboratory of Electromagnetic Space Information of CAS, Hefei, Anhui 230027, China

Multimedia Tools and Applications

their good performances, HOG and CN have became the most popular hand-craft features in recent years. However, hand-craft features are not suitable for all targets, which limits the performance of these trackers. Thus, leveaging data-driven features seem to a better way for target representation. Combining with features extracted from CNN, the correlation filter based methods such as DeepSRDCF [6], C-COT [9], ECO [10] certainly have a higher accuracy. On the other hand, trackers mentioned above require complex setup and high computation that could hardly meet

Data Loading...

A strong feature representation for siamese network tracker

Recommend Documents

A Cooperative Tracker by Fusing Correlation Filter and Siamese Network

Informative Feature-Guided Siamese Network for Early Diagnosis of Autism

Joint Pyramid Feature Representation Network for Vehicle Re-identification

CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers

Face Recognition Using Siamese Network

Binary Text Representation for Feature Selection

Anti-distractors: two-branch siamese tracker with both static and dynamic filters for object tracking

Feature Coding for Image Representation and Recognition

Adaptive Model Updating Correlation Filter Tracker with Feature Fusion

Multi-classifier Guided Discriminative Siamese Tracking Network

Siamese network for real-time tracking with action-selection

A strong intuitionistic fuzzy feature association map-based feature selection technique for high-dimensional data