Fully-Convolutional Siamese Networks for Object Tracking

The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object’s appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inher

PDF / 2,813,612 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
73 Downloads / 255 Views

DOWNLOAD

REPORT

Abstract. The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object’s appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.

Keywords: Object-tracking Deep-learning

1

·

Siamese-network

·

Similarity-learning

·

Introduction

We consider the problem of tracking an arbitrary object in video, where the object is identiﬁed solely by a rectangle in the ﬁrst frame. Since the algorithm may be requested to track any arbitrary object, it is impossible to have already gathered data and trained a speciﬁc detector. For several years, the most successful paradigm for this scenario has been to learn a model of the object’s appearance in an online fashion using examples extracted from the video itself [1]. This owes in large part to the demonstrated ability of methods like TLD [2], Struck [3] and KCF [4]. However, a clear deﬁciency of using data derived exclusively from the current video is that only comparatively simple models can be learnt. While other problems in computer

The ﬁrst two authors contributed equally, and are listed in alphabetical order. c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part II, LNCS 9914, pp. 850–865, 2016. DOI: 10.1007/978-3-319-48881-3 56

Fully-Convolutional Siamese Networks for Object Tracking

851

vision have seen an increasingly pervasive adoption of deep convolutional networks (conv-nets) trained from large supervised datasets, the scarcity of supervised data and the constraint of real-time operation prevent the naive application of deep learning within this paradigm of learning a detector per video. Several recent works have aimed to overcome this limitation using a pretrained deep conv-net that was learnt for a diﬀerent but related task. These approaches either apply “shallow” methods (e.g. correlation ﬁlters) using the network’s internal representation as features [5,6] or perform SGD (stochastic gradient descent) to ﬁne-tune multiple layers of the network [7–9]. While the use of shallow methods does not take full advantage of the beneﬁts of end-to-end learning, methods that apply SGD during tracking to achieve state-of-the-art results have not been able to operate in real-time. We advocate an alternative app

Data Loading...

Fully-Convolutional Siamese Networks for Object Tracking

Recommend Documents

SiamMN: Siamese modulation network for visual object tracking

Hierarchical correlation siamese network for real-time object tracking

Object Tracking Using Spatio-Temporal Networks for Future Prediction Location

Optimisation of a Siamese Neural Network for Real-Time Energy Efficient Object Tracking

Anti-distractors: two-branch siamese tracker with both static and dynamic filters for object tracking

Multi-classifier Guided Discriminative Siamese Tracking Network

Siamese network for real-time tracking with action-selection

Grid-based multi-object tracking with Siamese CNN based appearance edge and access region mechanism

Stereo Frustums: a Siamese Pipeline for 3D Object Detection

Hierarchical attentive Siamese network for real-time visual tracking

Tracking One Object

Efficient Adversarial Attacks for Visual Object Tracking