Second-order motion descriptors for efficient action recognition

  • PDF / 1,142,768 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 14 Downloads / 186 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Second‑order motion descriptors for efficient action recognition Reinier Oves García1 · Eduardo F. Morales1 · L. Enrique Sucar1 Received: 7 February 2020 / Accepted: 14 October 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Human action recognition from realistic video data constitutes a challenging and relevant research area. Leading the state of the art we can find those methods based on convolutional neural networks (CNNs), and specially two-stream CNNs. In this family of deep architectures, the appearance channel learns from the RGB images and the motion channel learns from a motion representation, usually, the optical flow. Given that action recognition requires the extraction of complex motion patterns descriptors in image sequences, we introduce a new set of second-order motion representations capable of capturing both: geometrical and kinematic properties of the motion (curl, div, curvature, and acceleration). Besides, we present a new and effective strategy capable of reducing training times without sacrificing the performance when using the I3D two-stream CNN and robust to the weakness of a single channel. The experiments presented in this paper were carried out over two of the most challenging datasets for action recognition: UCF101 and HMDB51. Reported results show an improvement in accuracy over the UCF101 dataset where an accuracy of 98.45% is achieved when the curvature and acceleration are combined as a motion representation. For the HMDB51, our approach shows a competitive performance, achieving an accuracy of 80.19%. In both datasets, our approach shows a considerable reduction in time for the preprocessing and training phases. Preprocessing time is reduced to a sixth of the time while the training procedure for the motion stream can be performed in a third of the time usually employed. Keywords  Activities recognition · Second-order motion descriptors · Fusion strategy

1 Introduction Video-based human action recognition is one of the most challenging tasks in computer vision. Action recognition deals with the problem of assigning a predefined label to an input video and provides support for several real-world applications, such as medicine, human-computer interaction, and robotics [2]. This research topic has been developed constantly in the last 2 decades [3], with considerable improvements when using non-handcrafted features [4].

This paper is an extension of [1]. In this extension, a new set of motion descriptors are presented and its performance is evaluated using the proposed strategy. Additionally, a more comprehensive evaluation is included with two datasets. * Reinier Oves García [email protected] 1



Instituto Nacional de Astrofísica Óptica y Electrónica (INAOE), Luis Enrique Erro # 1, Tonantzintla, Puebla C.P. 72840, Mexico

The best performance so far has been achieved by multistream approaches [5], specifically by two-stream CNNs [6], turning obsolete those approaches based on handcrafted features [7]. During the last 2 years, seve