Human Interaction Prediction Using Deep Temporal Features

Interaction prediction has a wide range of applications such as robot controlling and prevention of dangerous events. In this paper, we introduce a new method to capture deep temporal information in videos for human interaction prediction. We propose to u

PDF / 2,505,839 Bytes
12 Pages / 439.37 x 666.142 pts Page_size
22 Downloads / 225 Views

DOWNLOAD

REPORT

School of Computer Science and Software Engineering, The University of Western Australia, Crawley, Australia [email protected], {mohammed.bennamoun,senjian.an,farid.boussaid}@uwa.edu.au 2 School of Electrical, Electronic and Computer Engineering, The University of Western Australia, Crawley, Australia 3 School of Engineering and Information Technology, Murdoch University, Murdoch, Australia [email protected]

Abstract. Interaction prediction has a wide range of applications such as robot controlling and prevention of dangerous events. In this paper, we introduce a new method to capture deep temporal information in videos for human interaction prediction. We propose to use ﬂow coding images to represent the low-level motion information in videos and extract deep temporal features using a deep convolutional neural network architecture. We tested our method on the UT-Interaction dataset and the challenging TV human interaction dataset, and demonstrated the advantages of the proposed deep temporal features based on ﬂow coding images. The proposed method, though using only the temporal information, outperforms the state of the art methods for human interaction prediction. Keywords: Interaction prediction

1

· CNN · Temporal convolution

Introduction

Interaction prediction, or early event recognition, aims to infer an interaction at its early stage [1]. It can help in preventing harmful events (e.g., ﬁghting) in a surveillance scenario. It is also essential to robot-human interaction (e.g., when a human lifts his/her hand or opens his/her arms, the robot could then respond accordingly). Unlike interaction recognition, interaction prediction requires the inference of the action before it occurs. This requires the prediction of any potential future action, using the frames captured prior to the action. We can see from Fig. 1 that it is diﬃcult to infer the action class from a single frame. The temporal information and the combination of several frames, on the other hand, provide more information about the future action class. In this paper, we focus on the c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part II, LNCS 9914, pp. 403–414, 2016. DOI: 10.1007/978-3-319-48881-3 28

404

Q. Ke et al.

Fig. 1. Human interaction prediction. The goal is to predict the interaction class before it happens, which is diﬃcult to achieve from a single frame.

temporal information of video sequences and introduce a new deep temporal feature for human interaction prediction. Existing interaction prediction methods mainly use spatial features (e.g., bag-of-words) [1], or combine spatial and temporal features (e.g., histogram of oriented optical ﬂow) [2] to represent the video frames. These hand-crafted features are, however, not powerful enough to capture the salient motion information for interaction prediction due to their loss of the global structure in the data [3]. Recent works in large-scale recognition tasks [4,5] show that deep learned representations perf

Data Loading...

Human Interaction Prediction Using Deep Temporal Features

Recommend Documents

Features of Interest Points Based Human Interaction Prediction

Bus Travel-Time Prediction Based on Deep Spatio-Temporal Model

A Deep Spatial-Temporal Network for Vehicle Trajectory Prediction

Deep Spatio-Temporal Dense Network for Regional Pollution Prediction

Enhancing Link Prediction Using Gradient Boosting Features

Offline Signature Recognition Using Deep Features

Prediction of Membrane Protein Interaction Based on Deep Residual Learning

Foodborne Disease Outbreak Prediction Using Deep Learning

Hypertension Risk Prediction Using Deep Neural Network

Mining Temporal Sequence Patterns Using Association Rule Mining Algorithms for Prediction of Human Activity from Surveil

Human Pose Estimation Using Deep Consensus Voting

A comparative study of human facial age estimation: handcrafted features vs. deep features