On the evaluation and combination of state-of-the-art features in Twitter sentiment analysis

  • PDF / 1,142,064 Bytes
  • 50 Pages / 439.37 x 666.142 pts Page_size
  • 69 Downloads / 174 Views

DOWNLOAD

REPORT


On the evaluation and combination of state‑of‑the‑art features in Twitter sentiment analysis Jonnathan Carvalho1   · Alexandre Plastino2

© Springer Nature B.V. 2020

Abstract Sentiment analysis of short informal texts, such as tweets, remains a challenging task due to their particular characteristics. Much effort has been made in the literature of Twitter sentiment analysis to achieve an effective and efficient representation of tweets. In this context, distinct types of features have been proposed and employed, from the simple n-gram representation to meta-features to word embeddings. Hence, in this work, using a relevant set of twenty-two datasets of tweets, we present a thorough evaluation of features by means of different supervised learning algorithms. We evaluate not only a rich set of meta-features examined in state-of-the-art studies, but also a significant collection of pre-trained word embedding models. Also, we evaluate and analyze the effect of combining those distinct types of features in order to detect which combination may provide core information in the polarity detection task in Twitter sentiment analysis. For this purpose, we exploit different strategies for combination, such as feature concatenation and ensemble learning techniques, and show that the sentiment detection of tweets benefits from combining different types of features proposed in the literature. Keywords  Sentiment analysis · Meta-features · Word embeddings · Ensemble learning · Twitter

1 Introduction In recent years, much attention has been given to the content generated by Internet users. Since people can express their opinions and emotions about any target, such as products, services, and events around the globe, many consumers and companies can make decisions Electronic supplementary material  The online version of this article (https​://doi.org/10.1007/s1046​ 2-020-09895​-6) contains supplementary material, which is available to authorized users. * Jonnathan Carvalho [email protected] Alexandre Plastino [email protected] 1

Instituto Federal Fluminense (Campus Itaperuna), Itaperuna, Brazil

2

Universidade Federal Fluminense, Niterói, Brazil



13

Vol.:(0123456789)



J. Carvalho, A. Plastino

based on this ever-growing opinionated content. However, as a huge amount of opinions is published every day, manually seeking for and identifying them as conveying a positive or negative sentiment may be impractical. In this context, Sentiment Analysis, or Opinion Mining, is the field of study that analyzes people’s opinions, sentiments, appraisals, attitudes, and emotions toward entities and their attributes expressed in written text (Liu 2015). One of the key challenges in this field is regarding the automatic identification of opinions and emotions expressed in short informal texts, such as tweets. Tweets, which are short texts published on Twitter,1 make the task of sentiment analysis very complex due to their inherent characteristics, such as their informal linguistic style, the presence of misspelled words, and the careless