Deep analysis of an Arabic sentiment classification system based on lexical resource expansion and custom approaches bui

  • PDF / 1,105,642 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 49 Downloads / 173 Views

DOWNLOAD

REPORT


Deep analysis of an Arabic sentiment classification system based on lexical resource expansion and custom approaches building Ibtissam Touahri1 · Azzeddine Mazroui1 Received: 27 February 2020 / Accepted: 28 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Sentiment analysis aims to extract emotions from a broad set of data. This paper studies the impact of lexical resource enrichment on Arabic Sentiment Analysis. At first and as there is a lack of Arabic lexical resources in the field of sentiment analysis, we build new resources and use several lexicon construction methods. The first method is manual and it lies in extracting sentimental words from a selected dataset and the second is semi-automatic and based on translating an English lexicon into Arabic followed by a manual check. Both methods generate terms in word form. Besides the mentioned resources, the paper enriches an existing resource that contains terms related to four specific domains by creating its equivalent lemmatized version. Following various methods, we created lexicons with different morphologies to enrich the existing Arabic resources. Subsequently, these resources are used in developing a polarity classifier. The paper explains the followed steps to construct the different lexical resources, defines the pre-processing levels and gives statistics related to each lexicon. Then, we present the classification approaches we used to determine the polarity of the new data. In order to perform in depth analysis of the results in correspondence to the extracted features, we opt for the unsupervised and the supervised approaches that help to have a clear view on their internal architecture and process. The experiments are based on features alteration, besides opting for a feature selection approach in order to keep the most pertinent features and reduce the characteristic vector size. Moreover, we perform an in depth analysis of the characteristic vectors and corpus nature and we explain the main causes behind results improvement and degradation. The results of the tests carried out show the relevance of each component of the system. Keyword  Sentiment analysis · Opinion mining · Arabic language · Lexicon · Corpus · Lemmatization

1 Introduction Sentiment analysis (SA) is the study that computes opinions expressed about a specific topic. Many domains such as marketing, politics and healthcare opt for this study to make decisions. It also helps organizations to have an overview (feedback, preferences) about public opinions in order to make appropriate decisions. Individuals also seek for other opinions before making their own decisions. A huge amount of opinion data can be found on the Internet via social media, reviews, blog and forums. The expressed reviews may contain either factual information or personal * Ibtissam Touahri [email protected] Azzeddine Mazroui [email protected] 1



Department of Computer Science, Faculty of Sciences, University Mohammed First, Oujda, Morocco

opinions tha