Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons
In this paper, we explore tensor representations that can compactly capture higher-order relationships between skeleton joints for 3D action recognition. We first define RBF kernels on 3D joint sequences, which are then linearized to form kernel descripto
- PDF / 608,511 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 34 Downloads / 185 Views
NICTA/Data61/CSIRO, Canberra, Australia {piotr.koniusz,fatih.porikli}@data61.csiro.au 2 Australian Centre for Robotic Vision, Canberra, Australia [email protected] 3 Australian National University, Canberra, Australia {piotr.koniusz,anoop.cherian,fatih.porikli}@anu.edu.au
Abstract. In this paper, we explore tensor representations that can compactly capture higher-order relationships between skeleton joints for 3D action recognition. We first define RBF kernels on 3D joint sequences, which are then linearized to form kernel descriptors. The higher-order outer-products of these kernel descriptors form our tensor representations. We present two different kernels for action recognition, namely (i) a sequence compatibility kernel that captures the spatio-temporal compatibility of joints in one sequence against those in the other, and (ii) a dynamics compatibility kernel that explicitly models the action dynamics of a sequence. Tensors formed from these kernels are then used to train an SVM. We present experiments on several benchmark datasets and demonstrate state of the art results, substantiating the effectiveness of our representations.
Keywords: Kernel descriptors order tensors
1
·
Skeleton action recognition
·
Higher-
Introduction
Human action recognition is a central problem in computer vision with potential impact in surveillance, human-robot interaction, elderly assistance systems, and gaming, to name a few. While there have been significant advancements in this area over the past few years, action recognition in unconstrained settings still remains a challenge. There have been research to simplify the problem from using RGB cameras to more sophisticated sensors such as Microsoft Kinect that can localize human body-parts and produce moving 3D skeletons [1]; these skeletons are then used for recognition. Unfortunately, these skeletons are often noisy due to the difficulty in localizing body-parts, self-occlusions, and sensor range Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 3) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 37–53, 2016. DOI: 10.1007/978-3-319-46493-0 3
38
P. Koniusz et al.
errors; thus necessitating higher-order reasoning on these 3D skeletons for action recognition. There have been several approaches suggested in the recent past to improve recognition performance of actions from such noisy skeletons. These approaches can be mainly divided into two perspectives, namely (i) generative models that assume the skeleton points are produced by a latent dynamic model [2] corrupted by noise and (ii) discriminative approaches that generate compact representations of sequences on which classifiers are trained [3]. Due to the huge configuration space of 3D actions and the unavailability of sufficient training data, discriminative approaches have been the trend in the recent years for this problem. In this line
Data Loading...