A Review of Dynamic Maps for 3D Human Motion Recognition Using ConvNets and Its Improvement

  • PDF / 1,296,632 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 66 Downloads / 225 Views

DOWNLOAD

REPORT


A Review of Dynamic Maps for 3D Human Motion Recognition Using ConvNets and Its Improvement Zhimin Gao1 · Pichao Wang2

· Huogen Wang3 · Mingliang Xu1 · Wanqing Li4

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract RGB-D based action recognition is attracting more and more attention in both the research and industrial communities. However, due to the lack of training data, pre-training based methods are popular in this field. This paper presents a review of the concept of dynamic maps for RGB-D based human motion recognition using pretrained models in image domain. The dynamic maps recursively encode the spatial, temporal and structural information contained in the video sequence into dynamic motion images simultaneously. They enable the usage of Convolutional Neural Network and its pretained models on ImageNet for 3D human motion recognition. This simple, compact and effective representation achieves state-of-theart results on various gesture/action/activities recognition datasets. Based on the review of previous methods using this concept upon different modalities (depth, skeleton or RGBD data), a novel encoding scheme is developed and presented in this paper. The improved method generates effective flow-guided dynamic maps, and they could select the high motion window and distinguish the order among the frames with small motion. The improved flowguided dynamic maps achieve state-of-the-art results on the large Chalearn LAP IsoGD and NTU RGB+D datasets. Keywords Dynamic maps · 3D human motion recognition · ConvNets

B

Huogen Wang [email protected] Zhimin Gao [email protected] Pichao Wang [email protected] Mingliang Xu [email protected] Wanqing Li [email protected]

1

School of Information Engineering, Zhengzhou University, Zhengzhou, China

2

Alibaba Group (U.S.) Inc., Bellevue, WA, USA

3

School of Electrical and Information Engineer, Tianjin University, Tianjin, China

4

Advanced Multimedia Research Lab, University of Wollongong, Wollongong, Australia

123

Z. Gao et al.

1 Introduction RGB-D (Red, Green, Blue and Depth) based human action recognition, has attracted increasing attention, due to the availability of RGB-D video cameras and the advantages they offering over conventional RGB video. For example, the additional depth information provides insensitivity to illumination changes and allows a more reliable estimation of body silhouette and skeleton Shotton et al. [21]. However, it remains unclear how such video could be compactly and effectively represented and used for computer vision tasks including classification and recognition. A number of works Yang and Tian [41]; Xia et al. [39]; Wang et al. [32]; Vemulapalli et al. [26]; Wang et al. [31]; Yang et al. [43]; Oreifej and Liu [18]; Yang and Tian [42]; Lu et al. [17]; Liu et al. [16] have appeared in the literature following the lead of the earliest work Li et al. [13] that used RGB-D data for human action recognition. It is interesting to note that the methods proposed in these works are based on