A Novel Feature Engineering Approach for Twitter-Based Text Sentiment Analysis

With the increasing availability of handheld devices and greater affordability of mobile data, social media has become an inseparable part of the daily life of most of the society. Free availability, diversity, and massiveness of this data have inspired m

  • PDF / 735,236 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 83 Downloads / 174 Views

DOWNLOAD

REPORT


Abstract With the increasing availability of handheld devices and greater affordability of mobile data, social media has become an inseparable part of the daily life of most of the society. Free availability, diversity, and massiveness of this data have inspired many research work to use it for extracting various insights about a variety of subjects. Text sentiment analysis is the process of mining opinion polarity from a given document in natural language. Social media data is no exception to extract sentiments which could be used for a variety of tasks right from opinion mining to recommendations. In this work, Twitter has been chosen as the source of data corpus required due to the summarized content, ease of availability, and humongous reach among all classes of the society. This work uses traditional machine learning approaches for solving the problem of text sentiment analysis and proposes a novel feature engineering approach for merging the text-based and non-textual features of the dataset by including the predicted output lists also in the final feature set. The result analysis shows significant improvement in the performance of the machine learning model on the test dataset using the proposed approach. Keywords Sentiment analysis · Feature engineering · Predicted output · Text-based features · Non-textual features

1 Introduction The expanded action of Microblogging, Tagging, and Podcasting powered by the blast of Web 2.0 has influenced many researchers in mining these enriched information assets for gaining valuable insights [1]. Text sentiment analysis is the process of H. Nandy (B) · R. Sridhar Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu 620015, India e-mail: [email protected] R. Sridhar e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 P. K. Singh et al. (eds.), Evolving Technologies for Computing, Communication and Smart World, Lecture Notes in Electrical Engineering 694, https://doi.org/10.1007/978-981-15-7804-5_23

299

300

H. Nandy and R. Sridhar

mining the inherent sentiment conveyed by a text document and classifying the emotion to either positive class or negative class. The insights gained from the automation of the sentiment analysis approach can be used in many socioeconomic areas [2] examples including but not limited to the prediction of acceptance of a product launch [3], the stock price prediction [4] and prediction of the success of a political campaign [5]. Twitter [6] is a popular microblogging Web site, that has grown from 5000 tweets per day in 2007 [7] to 500 million tweets per day in 2013 [8]. The summarized content, easy availability, sixth digited growth, and ubiquitous reach of twitter have created a new field in sentiment analysis, namely Twitter sentiment analysis [9–14]. Owing to these reasons, this work has chosen Twitter as its primary data source for sentiment analysis. The complexities of natural language and the informal format used in