Human action recognition based on 3D body mask and depth spatial-temporal maps

PDF / 3,787,669 Bytes
18 Pages / 439.642 x 666.49 pts Page_size
64 Downloads / 317 Views

Human action recognition based on 3D body mask and depth spatial-temporal maps Xing Li1 · Zhenjie Hou1,2

· Jiuzhen Liang1 · Chen Chen3

Received: 7 May 2019 / Revised: 31 March 2020 / Accepted: 11 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In this paper, a method based on depth spatial-temporal maps(DSTMs) is presented for human action recognition from depth video sequences, which provides compact global spatial and temporal information of human motion for action recognition. In our approach, the initial frame of depth sequences is dilated to generate 3D body mask. The new depth sequences of major part of the human body are then computed after using 3D body mask on each depth frame. We project each frame of the new depth sequences onto three orthogonal axes to get three binary lists. Under each projection axis, binary lists are stitching in order through an entire depth sequence forming a DSTM. We evaluate our method on two standard databases. Experimental results show that this method could effectively capture the spatial and temporal information of human motion and improve the accuracy of human action recognition. Keywords Human action recognition · 3D body mask · Depth spatial-temporal map

Zhenjie Hou

[email protected] Xing Li [email protected] Jiuzhen Liang [email protected] Chen Chen [email protected] 1

College of Information Science and Engineering, Changzhou University, Changzhou, China

2

Jiangsu Province Networking and Mobile Internet Technology Engineering Key Laboratory, Huaian, China

3

Department of Electrical and Computer Engineering, University of North Carolina at Charlotte, Charlotte, USA

Multimedia Tools and Applications

1 Introduction Human action recognition has a wide range of application in human-computer interaction [2–4, 18, 19], including somatosensory games, intelligent monitoring system, etc. In the beginning, RGB camera was used to collect human body video sequences [8, 11]. In paper [1], the authors introduce Motion Energy Images (MEI) and Motion History Images (MHI) to capture the spatial and temporal information of human action in a video sequence. In paper [6], the authors propose a hierarchical extension algorithm for computing dense motion flow from MHI. However, these color image sequences based-methods are very sensitive to illumination changes, which greatly limit the robustness of action recognition. With the development of technology, especially the launch of Microsoft’s somatosensory device Kinect makes it possible to study human action recognition based on depth video sequences. Compared with color sequences, depth sequences have tremendous 3D information and are not sensitive to illumination changes, and it is also easier to extract the foreground of human actions. In recent years, many depth video sequences based-methods are proposed, including 3D points [9], spatial-temporal depth cuboids [14], depth motion maps (DMM) [16, 20], surface normals [10, 21], skeletons joints [13]. In paper [17], Yang pr

Data Loading...

Human action recognition based on 3D body mask and depth spatial-temporal maps

Recommend Documents

Human Action Recognition Algorithm Based on 3D DenseNet-BC

Human Action Recognition with Depth Cameras

3D Human Body Shape and Pose Estimation from Depth Image

Body maps on the human genome

Human Action Recognition Without Human

A 3D Human Posture Approach for Activity Recognition Based on Depth Camera

Human action recognition using distance transform and entropy based features

Multi-cue based 3D residual network for action recognition

Spatio-temporal attention on manifold space for 3D human action recognition

Human action recognition using deep rule-based classifier

Skeleton-Based Human Action Recognition with Profile Hidden Markov Models

A Review of Dynamic Maps for 3D Human Motion Recognition Using ConvNets and Its Improvement