Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification

In this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised sequential verific

PDF / 9,497,711 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
89 Downloads / 266 Views

DOWNLOAD

REPORT

e Robotics Institute, Carnegie Mellon University, Pittsburgh, USA {imisra,hebert}@cs.cmu.edu 2 Facebook AI Research, Menlo Park, USA [email protected]

Abstract. In this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised sequential veriﬁcation task, i.e., we determine whether a sequence of frames from a video is in the correct temporal order. With this simple task and no semantic labels, we learn a powerful visual representation using a Convolutional Neural Network (CNN). The representation contains complementary information to that learned from supervised image datasets like ImageNet. Qualitative results show that our method captures information that is temporally varying, such as human pose. When used as pre-training for action recognition, our method gives signiﬁcant gains over learning without external data on benchmark datasets like UCF101 and HMDB51. To demonstrate its sensitivity to human pose, we show results for pose estimation on the FLIC and MPII datasets that are competitive, or better than approaches using signiﬁcantly more supervision. Our method can be combined with supervised representations to provide an additional boost in accuracy. Keywords: Unsupervised learning · Videos · Sequence veriﬁcation Action recognition · Pose estimation · Convolutional neural networks

1

·

Introduction

Sequential data provides an abundant source of information in the form of auditory and visual percepts. Learning from the observation of sequential data is a natural and implicit process for humans [1–3]. It informs both low level cognitive tasks and high level abilities like decision making and problem solving [4]. For instance, answering the question “Where would the moving ball go?”, requires the development of basic cognitive abilities like prediction from sequential data like video [5]. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46448-0 32) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 527–544, 2016. DOI: 10.1007/978-3-319-46448-0 32

528

I. Misra et al.

In this paper, we explore the power of spatiotemporal signals, i.e., videos, in the context of computer vision. To study the information available in a video signal in isolation, we ask the question: How does an agent learn from the spatiotemporal structure present in video without using supervised semantic labels? Are the representations learned using the unsupervised spatiotemporal information present in videos meaningful? And ﬁnally, are these representations complementary to those learned from strongly supervised image data? In this paper, we explore such questions by using a sequential learning approach. Sequential learning is used in a variety of areas such as speech recognition, robotic path planning, adap

Data Loading...

Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification

Recommend Documents

Learning Actionness via Long-Range Temporal Order Verification

First-Order Timed Runtime Verification Using BDDs

Unsupervised Learning

Unsupervised Learning

Neural Spike Sorting Using Unsupervised Adversarial Learning

Supervised and Unsupervised Learning Models

Insider Threat Detection Using Multi-autoencoder Filtering and Unsupervised Learning

Generalization in Unsupervised Learning

Unsupervised Learning and Recommendation Algorithms

Unsupervised Learning Techniques

Unsupervised Learning Algorithms

Machine Learning with scikit-learn