A multimodal approach for multi-label movie genre classification

  • PDF / 2,778,406 Bytes
  • 26 Pages / 439.642 x 666.49 pts Page_size
  • 95 Downloads / 383 Views

DOWNLOAD

REPORT


A multimodal approach for multi-label movie genre classification Rafael B. Mangolin1 · Rodolfo M. Pereira2,3 · Alceu S. Britto Jr.3 ´ D. Feltrim1 · Diego Bertolini4 · Carlos N. Silla Jr.3 · Valeria Yandre M. G. Costa1

·

Received: 12 March 2020 / Revised: 14 September 2020 / Accepted: 15 October 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. The number of movie consumers interested in taking advantage of automatic movie genre classification is overgrowing, thanks to media streaming service providers’ popularization. In this paper, we addressed the multi-label classification of movie genres in a multimodal way. To this end, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters from 152,622 movie titles of the Movie Database (TMDb). Such a large dataset was carefully curated, organized, and made available as a contribution of this work. We labeled each movie of the dataset according to a set of eighteen genre labels. In the experimental evaluation performed in this paper, we computed different kinds of descriptors, such as Mel Frequency Cepstral Coefficients (MFCCs), Statistical Spectrum Descriptor (SSD), Local Binary Pattern (LBP) from spectrograms, Long-Short Term Memory (LSTM), and Convolutional Neural Networks (CNN). With these descriptors, we trained different monolithic classifiers using BinaryRelevance and ML-kNN techniques. Besides, we also explored the combination of classifiers/features using a late fusion strategy. The fusion of a LSTM trained on synopses and another LSTM trained on the movie subtitles provided our best results in F-Score (0.674) and AUC-PR (0.725) metrics. These results corroborate the existence of complementarity among classifiers trained on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of diversity of multimedia sources of information to perform movie genre classification. Keywords Movie genre classification · Multi-label classification · Multimodal classification · Movie trailer  Rafael B. Mangolin

[email protected]

Extended author information available on the last page of the article.

Multimedia Tools and Applications

1 Introduction Streaming media services have grown steadily over the past decade, mainly due to the consolidation of video on demand as a practical and comfortable way of allowing consumers to access films, series, documentaries, etc. Some giant companies (e.g., Netflix™, Hulu™, Amazon™prime video, YouTube™, and Facebook™watching)1 are rapidly gaining ground in this market, as they offer exclusive content through agreements with the movie industry and integrate other products and services that serve consumers’ interests more comprehensively. In this study, we addressed the movie genre classification using multimodal classifiers based on audio and images from trailers (i.e., audio and vi