Twitter Feature Selection and Classification Using Support Vector Machine for Aspect-Based Sentiment Analysis

In this paper, with regards to aspect-based sentiment classification accuracy problem, we propose a Principal Component Analysis (PCA) feature selection method that can determine the most relevant set of features for aspect-based sentiment classification.

  • PDF / 875,558 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 23 Downloads / 248 Views

DOWNLOAD

REPORT


Abstract. In this paper, with regards to aspect-based sentiment classification accuracy problem, we propose a Principal Component Analysis (PCA) feature selection method that can determine the most relevant set of features for aspect-based sentiment classification. Feature selection helps to reduce redundant features and remove irrelevant features which affect classifier accuracy. In this paper we present a method for feature selection for twitter aspect-based sentiment classification based on Principal Component Analysis (PCA). PCA is combined with Sentiwordnet lexicon-based method which is incorporated with Support Vector Machine (SVM) learning framework to perform the classification. Experiments on our own Hate Crime Twitter Sentiment (HCTS) and benchmark Stanford Twitter Sentiment (STS) datasets yields accuracies of 94.53 % and 97.93 % respectively. The comparisons with other statistical feature selection methods shows that our proposed approach shows promising results in improving aspect-based sentiment classification performance. Keywords: Twitter · Aspect-based feature extraction · Aspect-based sentiment classification · Feature selection · Principal component analysis · Support vector machine

1

Introduction

Microblogging today has become a very popular communication tool among internet users. Microblogging services become valuable sources of people’s opinions and sentiments. There are millions of messages appearing daily in popular microblogging websites such as Twitter, Tumblr and Facebook [1]. Twitter is a popular microblogging service where users create status messages (called tweets). Twitter messages have many unique attributes [2]. The characteristics are include the maximum length of 140 characters tweets, the magnitude of data available, the language model, domain and the diversity of contents or aspects which are not limited to any specific topics [3]. This characteristics are different from previous research, which focused on specific domains c Springer International Publishing Switzerland 2016  H. Fujita et al. (Eds.): IEA/AIE 2016, LNAI 9799, pp. 269–279, 2016. DOI: 10.1007/978-3-319-42007-3 23

270

N. Zainuddin et al.

such as movie reviews, restaurant reviews and product reviews. Twitter has emerged to become a gold mine rich with varied information that also contains opinions on current issues, complaint, and their thoughts for product they use everyday. The information also contain issues related to business, political and society. Previous studies have approached twitter sentiment analysis problem as a tweet-level sentiment classification task that is similar to document-level sentiment classification. Tweet-level or document-level sentiment classification determines the overall sentiment orientation of a tweet. However, getting an overall positive or negative sentiment might not be useful to the organizations as it is more important to determine what exactly the opinions of their consumers. Different people or users may express their views on different aspects of the products, or services. For insta