A methodology for image annotation of human actions in videos

  • PDF / 2,669,959 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 77 Downloads / 208 Views

DOWNLOAD

REPORT


A methodology for image annotation of human actions in videos Moomina Waheed 1 & Shahid Hussain 1 Bashir Ahmad 3

2

1

& Arif Ali Khan & Mansoor Ahmed &

Received: 8 July 2019 / Revised: 2 April 2020 / Accepted: 22 May 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

In the context of video-based image classification, image annotation plays a vital role in improving the image classification decision based on it’s semantics. Though, several methods have been introduced to adopt the image annotation such as manual and semisupervised. However, formal specification, high cost, high probability of errors and computation time remain major issues to perform image annotation. In order to overcome these issues, we propose a new image annotation technique which consists of three tiers namely frames extraction, interest point’s generation, and clustering. The aim of the proposed technique is to automate the label generation of video frames. Moreover, an evaluation model to assess the effectiveness of the proposed technique is used. The promising results of the proposed technique indicate the effectiveness (77% in terms of Adjusted Random Index) of the proposed technique in the context label generation for video frames. In the end, a comparative study analysis is made between the existing techniques and proposed methodology. Keywords Image annotation . SIFT . Clustering . Semantic analysis . Image labeling . Action recognition

* Shahid Hussain [email protected] Moomina Waheed [email protected] Arif Ali Khan [email protected] Mansoor Ahmed [email protected] Bashir Ahmad [email protected] Extended author information available on the last page of the article

Multimedia Tools and Applications

1 Introduction With the advancement in the social media application, a massive increase has been seen in the usage of digital imaging technology and application. Recently, millions of images and videos are shared every day. In computer vision, the labeling of images inside these videos relies on their semantic and is considered a challenging task [7]. In the context of label generation for video frames, the existing efforts of the research community can be categorized into two groups; manual label generation and semi-supervised. Manual image annotation methods involve human input to annotate an image based on its semantic. The user enters certain descriptive keywords for each image. Although in terms of accuracy, it provides the best results, on the contrary, it is considered as time-consuming and labor-intensive method, which trigger the increase in the overall cost. Subsequently, semi-supervised methods use a few labels to train a supervised classifier and then make a general categorization on the basis of this. It helps to achieve high efficiency and accuracy. Though, these methods focus on creating and refining annotation by encouraging the users to provide feedback on examining the retrieval results. However, this method requires high labor in terms of using user interfac