Unsupervised Deep Representation Learning for Real-Time Tracking

PDF / 2,934,120 Bytes
19 Pages / 595.276 x 790.866 pts Page_size
34 Downloads / 264 Views

Unsupervised Deep Representation Learning for Real-Time Tracking Ning Wang1 · Wengang Zhou1,2 · Yibing Song3 · Chao Ma4 · Wei Liu3 · Houqiang Li1,2 Received: 17 December 2019 / Accepted: 9 July 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The advancement of visual tracking has continuously been brought by deep learning models. Typically, supervised learning is employed to train these models with expensive labeled data. In order to reduce the workload of manual annotation and learn to track arbitrary objects, we propose an unsupervised learning method for visual tracking. The motivation of our unsupervised learning is that a robust tracker should be effective in bidirectional tracking. Specifically, the tracker is able to forward localize a target object in successive frames and backtrace to its initial position in the first frame. Based on such a motivation, in the training process, we measure the consistency between forward and backward trajectories to learn a robust tracker from scratch merely using unlabeled videos. We build our framework on a Siamese correlation filter network, and propose a multi-frame validation scheme and a cost-sensitive loss to facilitate unsupervised learning. Without bells and whistles, the proposed unsupervised tracker achieves the baseline accuracy of classic fully supervised trackers while achieving a real-time speed. Furthermore, our unsupervised framework exhibits a potential in leveraging more unlabeled or weakly labeled data to further improve the tracking accuracy. Keywords Visual tracking · Unsupervised learning · Correlation filter · Siamese network

1 Introduction

Communicated by Mei Chen, Cha Zhang and Katsushi Ikeuchi.

B B

Wengang Zhou [email protected] Houqiang Li [email protected] Ning Wang [email protected] Yibing Song [email protected] Chao Ma [email protected] Wei Liu [email protected]

1

The CAS Key Laboratory of GIPAS, University of Science and Technology of China, Hefei, China

2

Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China

3

Tencent AI Lab, Shenzhen, China

4

The MOE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China

Visual object tracking is a fundamental task in computer vision with numerous applications including video surveillance, autonomous driving, augmented reality, and humancomputer interactions. It aims to localize a moving object annotated at the initial frame with a bounding box. Recently, deep models have improved the tracking accuracies by strengthening the feature representations (Ma et al. 2015; Danelljan et al. 2016, 2017) or optimizing networks end-toend (Bertinetto et al. 2016; Li et al. 2018; Nam and Han 2016; Valmadre et al. 2017). These models are offline pretrained with full supervision, which requires a large number of annotated ground-truth labels during the training stage. Manual annotations are always expensive and time-consuming, whereas a huge number of unlabeled videos are readily

Data Loading...

Unsupervised Deep Representation Learning for Real-Time Tracking

Recommend Documents

Unsupervised representation learning with Minimax distance measures

Deep Representation Learning for Multimodal Brain Networks

Cross-Graph Representation Learning for Unsupervised Graph Alignment

Unsupervised Deep Hashing with Structured Similarity Learning

Unsupervised Deep Learning for Susceptibility Distortion Correction in Connectome Imaging

Unsupervised Deep Learning for Laboratory-Based Diffraction Contrast Tomography

Unsupervised Visual Time-Series Representation Learning and Clustering

Learning an Unsupervised and Interpretable Representation of Emotion from Speech

Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints

Deep Learning for Stock Index Tracking: Bank Sector Case

Realtime Data Mining Self-Learning Techniques for Recommendation Eng

A Comprehensive Study of Deep Neural Networks for Unsupervised Deep Learning