Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets
In this paper, we compare lexicon-based and machine learning-based approaches to define the subjectivity of tweets in Portuguese. We tested SentiLex and WordAffectBR lexicons, and Sequential Machine Optimization and Naive Bayes algorithms for this task. I
- PDF / 423,127 Bytes
- 9 Pages / 439.37 x 666.142 pts Page_size
- 30 Downloads / 237 Views
tract. In this paper, we compare lexicon-based and machine learning-based approaches to define the subjectivity of tweets in Portuguese. We tested SentiLex and WordAffectBR lexicons, and Sequential Machine Optimization and Naive Bayes algorithms for this task. In our study, we used the Computer-BR corpus that contains messages about the technology area. We obtained better results using the Comprehensive Measurement Feature Selection method and the Sequential Machine Optimization algorithm as the classifier. We achieved considerable accuracy when we included the polarities of words in the vector space model of tweets. Keywords: Subjectivity classification processing
Sentiment analysis Natural language
1 Introduction In the past decade, people have used social web to express and share their ‘sentiments’ about products and services. Texts published in social media (e.g., Twitter, Facebook, forums, blogs, and user forums) have become important sources of information for organizations. The analysis of these snippets of text is a way of monitoring the opinion and response from the clients of these organizations [1]. The area of research that automatically performs this processing is known as Sentiment Analysis or Opinion Mining. In this area, textual information can be categorized into two main types: facts and opinions. Opinions, unlike facts, describe people’s sentiments, appraisals, or feelings toward entities, events, and their properties. The task of defining whether a sentence expresses an opinion or a fact can be treated as a classification problem. This task is called subjectivity classification [2]. The subjectivity classification is a stage that precedes the Opinion Mining. When used, it improves Opinion Mining performance by preventing noisy and irrelevant extraction [3, 4]. In approaches that use machine learning algorithms for polarity classification, the improved results can be attributed to the balancing of training sets. Some authors mention that the imbalance of such approaches is caused by the class of objective sentences, which usually has a larger number of samples [5]. © Springer International Publishing Switzerland 2016 J. Silva et al. (Eds.): PROPOR 2016, LNAI 9727, pp. 86–94, 2016. DOI: 10.1007/978-3-319-41552-9_8
Comparing Approaches to Subjectivity Classification
87
In this paper, we compare two traditional approaches to subjectivity classification: based on lexicon, and machine learning algorithms [10]. We tested SentiLex and WordAffectBR lexicons, and Sequential Machine Optimization (SMO) and Naive Bayes algorithms to determine the subjectivity of the tweets from Computer-BR corpus, we that built for this study. This corpus is composed of messages in Portuguese language about the technology area. We categorized the tweets according to their sentiment orientations (polarities). We considered as subjective those sentences with positive or negative polarity, and as objective sentences the remaining ones. In the approach using machine learning, we tested a new method for feature selection: the Compr
Data Loading...