Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition

3D action recognition – analysis of human actions based on 3D skeleton data – becomes popular recently due to its succinctness, robustness, and view-invariant representation. Recent attempts on this problem suggested to develop RNN-based learning methods

PDF / 1,249,420 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
86 Downloads / 246 Views

DOWNLOAD

REPORT

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore {jliu029,amir3,wanggang}@ntu.edu.sg 2 School of Electrical and Information Engineering, University of Sydney, Sydney, Australia [email protected]

Abstract. 3D action recognition – analysis of human actions based on 3D skeleton data – becomes popular recently due to its succinctness, robustness, and view-invariant representation. Recent attempts on this problem suggested to develop RNN-based learning methods to model the contextual dependency in the temporal domain. In this paper, we extend this idea to spatio-temporal domains to analyze the hidden sources of action-related information within the input data over both domains concurrently. Inspired by the graphical structure of the human skeleton, we further propose a more powerful tree-structure based traversal method. To handle the noise and occlusion in 3D skeleton data, we introduce new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its eﬀect on updating the long-term context information stored in the memory cell. Our method achieves state-of-the-art performance on 4 challenging benchmark datasets for 3D human action analysis. Keywords: 3D action recognition · Recurrent neural networks short-term memory · Trust gate · Spatio-temporal analysis

1

· Long

Introduction

In recent years, action recognition based on the locations of major joints of the body in 3D space has attracted a lot of attention. Diﬀerent feature extraction and classiﬁer learning approaches are studied for 3D action recognition [1–3]. For example, Yang and Tian [4] represented the static postures and the dynamics of the motion patterns via eigenjoints and utilized a Na¨ıve-Bayes-Nearest-Neighbor classiﬁer learning. A HMM was applied by [5] for modeling the temporal dynamics of the actions over a histogram-based representation of 3D joint locations. Evangelidis et al. [6] learned a GMM over the Fisher kernel representation of a succinct skeletal feature, called skeletal quads. Vemulapalli et al. [7] represented the skeleton conﬁgurations and actions as points and curves in a Lie group c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 816–833, 2016. DOI: 10.1007/978-3-319-46487-9 50

Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition

817

respectively, and utilized a SVM classiﬁer to classify the actions. A skeletonbased dictionary learning utilizing group sparsity and geometry constraint was also proposed by [8]. An angular skeletal representation over the tree-structured set of joints was introduced in [9], which calculated the similarity of these features over temporal dimension to build the global representation of the action samples and fed them to SVM for ﬁnal classiﬁcation. Recurrent neural networks (RNNs) which are a variant of neural nets for handling sequential data with variable length, have been successfully applied to language modeling [10–12]

Data Loading...

Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition

Recommend Documents

Spatiotemporal saliency-based multi-stream networks with attention-aware LSTM for action recognition

Human Action Recognition Without Human

Human Action Recognition Algorithm Based on 3D DenseNet-BC

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition

Gates Cambridge Trust

Human Action Recognition with Depth Cameras

Human Action Prediction with 3D-CNN

Spatiotemporal attention enhanced features fusion network for action recognition

Spatio-temporal attention on manifold space for 3D human action recognition

Multi-cue based 3D residual network for action recognition

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition

Adversarial Self-supervised Learning for Semi-supervised 3D Action Recognition