Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media

  • PDF / 2,691,305 Bytes
  • 37 Pages / 439.37 x 666.142 pts Page_size
  • 80 Downloads / 185 Views

DOWNLOAD

REPORT


Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media Arghasree Banerjee 1 & Mayukh Bhattacharjee 1 & Kushankur Ghosh 1 & Sankhadeep Chatterjee 1 Received: 11 November 2019 / Revised: 24 April 2020 / Accepted: 27 May 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Recent developments in sarcasm detection have been emerged as extremely successful tools in Social media opinion mining. With the advent of machine learning tools, accurate detection has been made possible. However, the social media data used to train the machine learning models is often ill suited due to the presence of highly imbalanced classes. In absence of any thorough study on the effect of imbalanced classes in sarcasm detection for social media opinion mining, the current article proposed synthetic minority oversampling based methods to mitigate the issue of imbalanced classes which can severely effect the classifier performance in social media sarcasm detection. In the current study, five different variants of synthetic minority oversampling technique have been used on two different datasets of varying sizes. The trustworthiness is judged by training and testing of six well known classifiers and measuring their performance in terms of test phase confusion matrix based performance measuring metrics. The experimental results indicated that SMOTE and BorderlineSMOTE – 1 are extremely successful in improving the classifier performance. A thorough analysis has been performed to better understand the effect of imbalanced classes in social media sarcasm detection. Keywords Sarcasm detection . SMOTE . Social media . Imbalanced class . Social emotion . Affective computing

* Arghasree Banerjee [email protected] Mayukh Bhattacharjee [email protected] Kushankur Ghosh [email protected] Sankhadeep Chatterjee [email protected] Extended author information available on the last page of the article

Multimedia Tools and Applications

1 Introduction The rapid popularity of microblogging websites such as Twitter [85] has successfully constructed a great source to understand the likes and dislikes of the public. These websites are mostly utilized to exchange opinions and views on variety of topics which can be both socially critical and uncritical. Moreover, the public reaction on any particular incident can also be evaluated through these microblogging sites. For example, people get to share incidents regarding natural disasters [88], get to know about the impact of any new released movie [5] and also get to know the reviews about any newly manufactured product [80]. This positive rise of microblogs and variance of public opinion gave rise to the field of opinion mining [72] which has been proved to be the key element to understand the recent trend and mindset of the public. Sarcasm can be defined as a positively uttered statement possessing an underlying negative sentiment [34]. This mostly results in a significant confusion in identification of the mo