ArAutoSenti: automatic annotation and new tendencies for sentiment classification of Arabic messages

  • PDF / 2,092,660 Bytes
  • 20 Pages / 595.276 x 790.866 pts Page_size
  • 97 Downloads / 229 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

ArAutoSenti: automatic annotation and new tendencies for sentiment classification of Arabic messages Imane Guellil1,2   · Faical Azouaou2 · Francisco Chiclana3,4 Received: 1 February 2020 / Revised: 3 June 2020 / Accepted: 4 August 2020 © Springer-Verlag GmbH Austria, part of Springer Nature 2020

Abstract A corpus-based sentiment analysis approach for messages written in Arabic and its dialects is presented and implemented. The originality of this approach resides in the automation construction of the annotated sentiment corpus, which relies mainly on a sentiment lexicon that is also constructed automatically. For the classification step, shallow and deep classifiers are used with features being extracted applying word embedding models. For the validation of the constructed corpus, we proceed with a manual reviewing and it was found that 85.17% were correctly annotated. This approach is applied on the under-resourced Algerian dialect and the approach is tested on two external test corpora presented in the literature. The obtained results are very encouraging with an F1 score that is up to 88% (on the first test corpus) and up to 81% (on the second test corpus). These results, respectively, represent a 20% and a 6% improvement, respectively, when compared with existing work in the research literature. Keywords  Arabic sentiment analysis · Arabic and its dialects · Automatic resources construction · Shallow/deep classification · Word embedding · Document embedding

1 Introduction Opinions on a product, a company or a political personality are important for business managers and company directors. The emergence of the Internet and social media has made available and will continue to do so in the future, large amounts of data containing significant numbers of opinions, sentiments and emotions, thus engendering interest on their analysis. Sentiment analysis (SA) research has really paid * Imane Guellil [email protected]

Francisco Chiclana [email protected]; [email protected]

1



Aston University and Folding Space, Birmingham, UK

2



Laboratoire des Méthodes de Conception des Systèmes, Ecole nationale Supérieure d’Informatique, BP 68M, 16309 Oued‑Smar, Alger, Algeria

3

Institute of Artificial Intelligence (IAI), Faculty of Computing, Engineering and Media, De Montfort University, Leicester, UK

4

Andalusian Research Institute on Data Science and Computational Intelligence, University of Granada, Granada, Spain





off for languages such as English, French or Chinese, with frequent and numerous studies and work being published. For other languages, such as Arabic and its dialects, research works have just started to give usable results. The reason for the low amount of work focusing on the Arabic language is twofold: (1) the morphological richness of this language and its dialects makes its analysis very complex and, therefore, challenging; and (2) the lack of resources dedicated to this language and, in particular, to its dialects. There are two main approaches proposed for Arabic