Graph Based Skeleton Motion Representation and Similarity Measurement for Action Recognition

Most of existing skeleton-based representations for action recognition can not effectively capture the spatio-temporal motion characteristics of joints and are not robust enough to noise from depth sensors and estimation errors of joints. In this paper, w

  • PDF / 1,661,638 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 76 Downloads / 197 Views

DOWNLOAD

REPORT


CAS Center for Excellence in Brain Science and Intelligence Technology, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China {pei.wang,cfyuan,wmhu,bli}@nlpr.ia.ac.cn 2 School of Computer Science, Northwestern Polytechnical University, Xi’an, China [email protected]

Abstract. Most of existing skeleton-based representations for action recognition can not effectively capture the spatio-temporal motion characteristics of joints and are not robust enough to noise from depth sensors and estimation errors of joints. In this paper, we propose a novel low-level representation for the motion of each joint through tracking its trajectory and segmenting it into several semantic parts called motionlets. During this process, the disturbance of noise is reduced by trajectory fitting, sampling and segmentation. Then we construct an undirected complete labeled graph to represent a video by combining these motionlets and their spatio-temporal correlations. Furthermore, a new graph kernel called subgraph-pattern graph kernel (SPGK) is proposed to measure the similarity between graphs. Finally, the SPGK is directly used as the kernel of SVM to classify videos. In order to evaluate our method, we perform a series of experiments on several public datasets and our approach achieves a comparable performance to the state-of-the-art approaches.

Keywords: 3D human action recognition motion

1

·

Graph kernel

·

Skeleton

Introduction

With the development of depth sensors such as Microsoft Kinect and Asus Xtion PRO LIVE, a growing number of researchers focus on 3D action recognition. The human body can be viewed as an articulated system including rigid segments connected by joints, and human actions can be considered as a combination of the movements of human skeleton joints in the 3D space [34]. Therefore, the motions of human skeleton joints is effective for action recognition, which has been also suggested in the early work of Johansson [16]. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 370–385, 2016. DOI: 10.1007/978-3-319-46478-7 23

Graph Based Skeleton Motion Representation and Similarity Measurement

371

Shotton et al. [24] proposed a method to estimate the 3D positions of joints from the depth maps and extracted discriminative features from joints to describe the motion of human skeleton. Inspired by this work, many researchers [2,5,9,11,12,14,25,26,28,30,33,35] focus on exploiting the skeleton based algorithm for 3D action recognition. However, how to utilize the skeleton information effectively is still a nontrivial issue. First, the inherent noise from depth sensors and estimation errors of the human skeleton joints are the major disturbances for action recognition. The most coordinates of joints in several videos are even all erroneous. In addition, the specific spatio-temporal dynamic structures of human actions are still not extracted and represented completely. Finally, finding a feasible and efficient way