Sentiment lexicons and non-English languages: a survey

  • PDF / 797,399 Bytes
  • 36 Pages / 439.37 x 666.142 pts Page_size
  • 3 Downloads / 143 Views

DOWNLOAD

REPORT


Sentiment lexicons and non-English languages: a survey Mohammed Kaity1 · Vimala Balakrishnan1 Received: 10 October 2018 / Accepted: 14 July 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The ever-increasing number of Internet users and online services, such as Amazon, Twitter and Facebook has rapidly motivated people to not just transact using the Internet but to also voice their opinions about products, services, policies, etc. Sentiment analysis is a field of study to extract and analyze public views and opinions. However, current research within this field mainly focuses on building systems and resources using the English language. The primary objective of this study is to examine existing research in building sentiment lexicon systems and to classify the methods with respect to non-English datasets. Additionally, the study also reviewed the tools used to build sentiment lexicons for non-English languages, ranging from those using machine translation to graph-based methods. Shortcomings are highlighted with the approaches along with recommendations to improve the performance of each approach and areas for further study and research. Keywords Sentiment analysis · Sentiment Lexicon · Lexicon-based · Multilingual sentiment analysis

1 Introduction The Internet has enabled users to expose views and opinions regarding products, social issues, policies, and much more. Thus, the Internet has rapidly evolved into a massive data warehouse consisting of user opinions and emotions [1, 2]. Sentiment analysis is a field of study that refers to analyzing, interpreting and evaluating opinions. It is considered to be one of the most popular research areas using NLP techniques, text analysis and computational linguistics to identify text polarity, either as positive, negative or neutral [3]. Due to the urgent need to understand user trends on a particular subject, sentiment analysis has fast become one of the most critical and value-added research areas over the past few years [1, 4]. Sentiment analysis systems are necessary tools that help to analyze and interpret enormous amounts of data and information thereby identifying and extracting user’s opinions

B

Vimala Balakrishnan [email protected] Mohammed Kaity [email protected]

1

Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

123

M. Kaity, V. Balakrishnan

and emotions [1, 5]. Despite the enormity of texts available for multiple languages, the focus of most sentiment analysis studies has been primarily on the English language [5]. Sentiment analysis continues to be a major research domain to explore, acknowledging the many challenges and language difficulties that exist but does, however, demonstrate great promise [5]. Sentiment classification consists of two broad categories: lexicon-based and machine learning-based classifications [6]. The classifiers based on sentiment lexicons (i.e., a list of semantic polar words) are lexicon-based or rule