Characterizing Opinion Mining: A Systematic Mapping Study of the Portuguese Language

The growth of social media and user-generated content (UGC) on the Internet provides a huge quantity of information that allows discovering the experiences, opinions, and feelings of users or customers. Opinion Mining (OM) is a sub-field of text mining in

  • PDF / 388,342 Bytes
  • 6 Pages / 439.37 x 666.14 pts Page_size
  • 91 Downloads / 197 Views

DOWNLOAD

REPORT


1

MiningBR Research Group, Federal Rural University of Pernambuco (UFRPE), Serra Talhada, PE, Brazil [email protected], [email protected], [email protected] 2 Centro de Informática, Federal University of Pernambuco (CIn-UFPE), Recife, PE, Brazil {eprs,alio}@cin.ufpe.br 3 Programa de Pós-graduação em Engenharia Biomédica, Centro de Tecnologia e Geociências, Federal University of Pernambuco (CTG-UFPE), Recife, PE, Brazil [email protected]

Abstract. The growth of social media and user-generated content (UGC) on the Internet provides a huge quantity of information that allows discovering the experiences, opinions, and feelings of users or customers. Opinion Mining (OM) is a sub-field of text mining in which the main task is to extract opinions from UGC. Given that Portuguese is one of the most common spoken languages in the world, and it is also the second most frequent on Twitter, the goal of this work is to plot the landscape of current studies that relates the application of OM for Portuguese. A systematic mapping review (SMR) method was applied to search, select and to extract data from the included studies. Manual and automated searches retrieved 6075 studies up to year 2014, from which 25 articles were included. Almost 70 % of all approaches focus on the Brazilian Portuguese variant. Naïve Bayes and Support Vector Machine were the main classifiers and SentiLex-PT was the most used lexical resource. Portugal and Brazil are the main contributors in processing the Portuguese language. Keywords: Text mining · Text classification · Opinion mining · Sentiment analysis · Portuguese language

1

Introduction

The growth of social media and user-generated content (UGC) on the Internet provides a huge quantity of information that allows discovering the experiences, opinions, and feelings of users or customers. The volume of this kind of data has grown to petabytes [1]. These electronic Word of Mouth (eWOM) statements are prevalent in business and service industry to enable a customer to share his/her point of view [2]. However, it is impossible for humans to fully understand UGC in a reasonable amount of time, which is why there has been a growing interest in the scientific community to create systems capable of extracting information from it [3]. Opinion © Springer International Publishing Switzerland 2016 J. Silva et al. (Eds.): PROPOR 2016, LNAI 9727, pp. 122–127, 2016. DOI: 10.1007/978-3-319-41552-9_12

Characterizing Opinion Mining

123

mining (OM) is a sub-field of text mining in which the main task is to extract opinions from UGC [3]. OM detects, extracts, and classifies opinions concerning different topics. Common opinion classes are: positive, negative, and neutral [2]. For [2], sentiment analysis, opinion mining, and subjectivity analysis are interrelated areas of research which use various techniques taken from Natural Language Processing, Information Retrieval, structured and unstructured Data Mining. Whereas data mining is largely language independent, text mining involves a significant langu