Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment

  • PDF / 3,004,291 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 12 Downloads / 182 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment Shrajal Jain1 • Aditya Rustagi1 • Sumeet Saurav2



Ravi Saini2 • Sanjay Singh2

Received: 30 December 2019 / Accepted: 29 September 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Existing techniques for Yoga pose recognition build classifiers based on sophisticated handcrafted features computed from the raw inputs captured in a controlled environment. These techniques often fail in complex real-world situations and thus, pose limitations on the practical applicability of existing Yoga pose recognition systems. This paper presents an alternative computationally efficient approach for Yoga pose recognition in complex real-world environments using deep learning. To this end, a Yoga pose dataset was created with the participation of 27 individual (8 males and 19 females), which consists of ten Yoga poses, namely Malasana, Ananda Balasana, Janu Sirsasana, Anjaneyasana, Tadasana, Kumbhakasana, Hasta Uttanasana, Paschimottanasana, Uttanasana, and Dandasana. To capture the videos, we used smartphone cameras having 4 K resolution and 30 fps frame rate. For the recognition of Yoga poses in real time, a three-dimensional convolutional neural network (3D CNN) architecture is designed and implemented. The designed architecture is a modified version of the C3D architecture initially introduced for the recognition of human actions. In the proposed modified C3D architecture, the computationally intensive fully connected layers are pruned, and supplementary layers such as the batch normalization and average pooling were introduced for computational efficiency. To the best of our knowledge, this is among the first studies, which utilized the inherent spatial–temporal relationship among Yoga poses for their recognition. The designed 3D CNN architecture achieved test recognition accuracy of 91.15% on the in-house prepared Yoga pose dataset consisting of ten Yoga poses. Furthermore, on the publicly available dataset, the designed architecture achieved competitive test recognition accuracy of 99.39%, along with multifold improvement in the execution speed compared to the existing state-of-the-art technique. To promote further study, we will make the in-house created Yoga pose dataset publicly available to the research community. Keywords Deep learning  3D CNN  C3D  Yoga pose recognition  Human action recognition

1 Introduction & Sumeet Saurav [email protected] Shrajal Jain [email protected] Aditya Rustagi [email protected] Ravi Saini [email protected] Sanjay Singh [email protected] 1

Department of Electrical Engineering, Birla Institute of Technology and Science (BITS), Pilani 333 031, India

2

Cognitive Computing Group, CSIR-Central Electronics Engineering Research Institute, Pilani 333 031, India

Yoga is a spiritual practice that began its roots in India about 5000 years ago. With a tran