Human Action Prediction with 3D-CNN

  • PDF / 2,146,624 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 7 Downloads / 194 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Human Action Prediction with 3D‑CNN Reem Alfaifi1 · A. M. Artoli1 Received: 24 February 2020 / Accepted: 7 August 2020 © Springer Nature Singapore Pte Ltd 2020

Abstract Human activity prediction (HAP) is gaining increasing interest in the field of computer vision. Various methods that utilize different feature extraction techniques have been proposed to solve numerous issues in HAP. This paper reviews recent developments in HAP, feature extraction techniques, and classification methods, identifying advantages and disadvantages of each method. It also reviews public and private datasets used in HAP. In addition, the experimentally obtained accuracies and performances of these methods are presented, compared, and discussed. Furthermore, it proposes and presents the test results of a new HAP model, in which a 3D convolutional neural network is utilized for feature extraction model and long short-term memory is employed for classification. The proposed model uses parallel binary classifiers to calculate the correct future actions. Additionally, the accuracy and F1 score of the proposed model are presented, as obtained using the MSR Daily Action and UCF101 datasets for the training and testing phases, demonstrating its capability in comparison with existing HAP models. Keywords  Human activity prediction · 3D-CNN · LSTM · UCF101 · MSR daily action

Introduction Human activity prediction (HAP) is an active research area in computer vision, which heavily exploits machine learning algorithms [such as neural networks and support vector machines (SVMs)] to infer unfinished, possibly interacting, human activities and actions from recorded video frames. It aims to report abnormal human behavior in real time and predict possible scenarios before incidents take place [1–6]. Such analysis has many applications; for example, it may be lifesaving for people with special needs and critical care patients. The key difference between human activity recognition (HAR) and HAP is that HAP predicts unfinished activities, whereas HAR analyzes finished activities. Fewer studies have been conducted in HAP than in HAR, probably due to the greater degree of freedom involved in interpreting human actions from their registered history. This study * Reem Alfaifi [email protected] A. M. Artoli [email protected] 1



Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11453, Saudi Arabia

focuses only on HAP; therefore, we refer the reader to [7] and [75] for an extensive survey of HAR. HAP has a wide range of applications, including video surveillance, robot behavior [8], human–computer interaction, and health care systems. These prediction systems provide critical support in areas such as crime, accidents, and stampede prevention [1–5, 9–14]. Due to the accumulated experience of human interaction, HAP allows human agents to classify and predict activity in the near future. HAP research faces various challenges, including the immobility of the human body, dissim