Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons

In this paper, we explore tensor representations that can compactly capture higher-order relationships between skeleton joints for 3D action recognition. We first define RBF kernels on 3D joint sequences, which are then linearized to form kernel descripto

PDF / 608,511 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
34 Downloads / 187 Views

DOWNLOAD

REPORT

NICTA/Data61/CSIRO, Canberra, Australia {piotr.koniusz,fatih.porikli}@data61.csiro.au 2 Australian Centre for Robotic Vision, Canberra, Australia [email protected] 3 Australian National University, Canberra, Australia {piotr.koniusz,anoop.cherian,fatih.porikli}@anu.edu.au

Abstract. In this paper, we explore tensor representations that can compactly capture higher-order relationships between skeleton joints for 3D action recognition. We ﬁrst deﬁne RBF kernels on 3D joint sequences, which are then linearized to form kernel descriptors. The higher-order outer-products of these kernel descriptors form our tensor representations. We present two diﬀerent kernels for action recognition, namely (i) a sequence compatibility kernel that captures the spatio-temporal compatibility of joints in one sequence against those in the other, and (ii) a dynamics compatibility kernel that explicitly models the action dynamics of a sequence. Tensors formed from these kernels are then used to train an SVM. We present experiments on several benchmark datasets and demonstrate state of the art results, substantiating the eﬀectiveness of our representations.

Keywords: Kernel descriptors order tensors

1

·

Skeleton action recognition

·

Higher-

Introduction

Human action recognition is a central problem in computer vision with potential impact in surveillance, human-robot interaction, elderly assistance systems, and gaming, to name a few. While there have been signiﬁcant advancements in this area over the past few years, action recognition in unconstrained settings still remains a challenge. There have been research to simplify the problem from using RGB cameras to more sophisticated sensors such as Microsoft Kinect that can localize human body-parts and produce moving 3D skeletons [1]; these skeletons are then used for recognition. Unfortunately, these skeletons are often noisy due to the diﬃculty in localizing body-parts, self-occlusions, and sensor range Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 3) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 37–53, 2016. DOI: 10.1007/978-3-319-46493-0 3

38

P. Koniusz et al.

errors; thus necessitating higher-order reasoning on these 3D skeletons for action recognition. There have been several approaches suggested in the recent past to improve recognition performance of actions from such noisy skeletons. These approaches can be mainly divided into two perspectives, namely (i) generative models that assume the skeleton points are produced by a latent dynamic model [2] corrupted by noise and (ii) discriminative approaches that generate compact representations of sequences on which classiﬁers are trained [3]. Due to the huge conﬁguration space of 3D actions and the unavailability of suﬃcient training data, discriminative approaches have been the trend in the recent years for this problem. In this line

Data Loading...

Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons

Recommend Documents

Adaptive Convolution Kernel for Text Classification via Multi-channel Representations

Efficient Multidimensional Pattern Recognition in Kernel Tensor Subspaces

Learning Attentive and Hierarchical Representations for 3D Shape Recognition

Multi-cue based 3D residual network for action recognition

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition

Adversarial Self-supervised Learning for Semi-supervised 3D Action Recognition

Sparse Representations for Speech Recognition

First-person activity recognition from micro-action representations using convolutional neural networks and object flow

Human Action Recognition Algorithm Based on 3D DenseNet-BC

Supervised Learning for Human Action Recognition from Multiple Kinects

Representations for Isotropic and Anisotropic Non-Polynomial Tensor Functions

Anisotropic Invariants and Additional Results for Invariant and Tensor Representations