Weakly-supervised action localization based on seed superpixels

  • PDF / 6,469,386 Bytes
  • 18 Pages / 439.642 x 666.49 pts Page_size
  • 98 Downloads / 193 Views

DOWNLOAD

REPORT


Weakly-supervised action localization based on seed superpixels Sami Ullah1 · Naeem Bhatti1

· Tehreem Qasim1 · Najmul Hassan1 · Muhammad Zia1

Received: 6 January 2020 / Revised: 29 July 2020 / Accepted: 29 September 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In this paper, we present action localization based on weak supervision with seed superpixels. In order to benefit from the superpixel segmentation and to learn a priori knowledge we select the seed superpixels from the action and non-action areas of few video frames of an action sequence equally. We compute correlation, joint entropy and joint histogram as the features of the video frame superpixels based on the optical flow magnitudes and intensity information. An SVM is trained with the action and non-action seed superpixels features and is used to classify the video frame superpixels as action and non-action. The superpixels classified as action provide the action localization. The localized action superpixels are used to recognize the action class by the Dendrogram-SVM based on the already extracted features. We evaluate the performance of the proposed approach for action localization and recognition using UCF sports and UCF-101 actions datasets, which demonstrates that the seed superpixels provide effective action localization and in turn facilitates to recognize the action class. Keywords Action localization · Action recognition · Feature extraction · Seed superpixels

1 Introduction Action localization and recognition finds its importance in video surveillance, human machine interaction, security purposes and artificial vision. Action localization is to detect the spatio-temporal areas comprising an action performed by an object in video sequence. Most of the approaches presented in literature perform action recognition without performing action localization. Localizing an action can further facilitate its recognition in a framework where the outcome of localization stage propagates to the stage of recognition. Actions may be found performed in different environments. For example, a walking action by a person in a garden, the same person may be found walking in a market. In such scenarios, the action localization becomes important in order to increase the generalization  Naeem Bhatti

[email protected] 1

COMSIP LAB, Department of Electronics, Quaid-i-Azam University, 45320, Islamabad, Pakistan

Multimedia Tools and Applications

of the action recognition frame work. Regarding action recognition by first localizing it, comparatively, less efforts are found in literature. Moving background and camera, occlusion, clutter, illumination changes pose challenges in localizing an action. Handling these challenges, various spatio-temporal feature based algorithms have been presented [21, 39]. Yang et al. perform action localization and recognition in still images using spatial features only [37]. While, Pero et al. and Karpathy et al. use temporal features only to perform action localization and recognition in [8]