Sentiment Analysis on Urdu Tweets Using Markov Chains
- PDF / 1,660,918 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 91 Downloads / 206 Views
ORIGINAL RESEARCH
Sentiment Analysis on Urdu Tweets Using Markov Chains Zarmeen Nasim1 · Sayeed Ghani1 Received: 27 January 2020 / Accepted: 30 July 2020 © Springer Nature Singapore Pte Ltd 2020
Abstract This paper presents a sentiment analysis approach based on Markov chains for predicting the sentiment of Urdu tweets. Sentiment analysis has been a focus of natural language processing (NLP) research community from the past few decades. The reason for this growing interest is twofold. First, the complexity involved in identifying sentiment from the unstructured text makes it a challenging problem for the research community. Second, sentiment analysis has a wide variety of applications ranging from industry to academia has made it a popular area in the research field of NLP. However, very little work has been done on sentiment analysis for the low resource languages which include Urdu, Bengali, Hindi, and other Asian languages. This work focuses on developing a 3-class (positive, negative, and neutral) sentiment classification model for the Urdu language. The experiments were conducted on the labeled corpus of Urdu tweets extracted from the Twitter network. One of the main contributions of this research includes the development of a large labeled corpus of Urdu Tweets for sentiment analysis. To the best of our knowledge, there is no such corpus available publicly in the Urdu Language. The labeled dataset is available on GitHub (https: //github .com/zarmee n92/urdutw eets) . Furthermore, the results showed that the proposed approach outperforms the lexicon-based and traditional machine learning-based approaches of sentiment analysis. Keywords Sentiment analysis · Markov chains · Urdu language · Opinion mining
Introduction With the recent advances in social media, sentiment analysis has become an active area of research in natural language processing domain. Social media sites which include Twitter, Facebook, and Instagram enable people to share their opinion on a variety of topics with a large network of people. One of the most popular websites which we considered for this research is Twitter. Twitter has 330 million active users. More than 500 million tweets are posted every day on Twitter. Moreover, Twitter supports 40 languages including the Urdu language (https://blog.hootsuite.com/twitter-statistics /). With these interesting statistics, we can consider Twitter as one of the largest social networks. Sentiment analysis, also known as opinion mining, is the technique of determining the semantic orientation or the * Zarmeen Nasim [email protected] Sayeed Ghani [email protected] 1
Faculty of Computer Science, Institute of Business Administration (IBA), Karachi, Pakistan
polarity of the text. It comes under the umbrella of tasks performed in natural language processing. Over the past few decades, sentiment analysis has gained overwhelming popularity in the research community due to the rapid growth of social media websites. Sentiment analysis provides interesting insights into understanding the
Data Loading...