Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets

In this paper, we compare lexicon-based and machine learning-based approaches to define the subjectivity of tweets in Portuguese. We tested SentiLex and WordAffectBR lexicons, and Sequential Machine Optimization and Naive Bayes algorithms for this task. I

PDF / 423,127 Bytes
9 Pages / 439.37 x 666.142 pts Page_size
30 Downloads / 272 Views

DOWNLOAD

REPORT

tract. In this paper, we compare lexicon-based and machine learning-based approaches to deﬁne the subjectivity of tweets in Portuguese. We tested SentiLex and WordAffectBR lexicons, and Sequential Machine Optimization and Naive Bayes algorithms for this task. In our study, we used the Computer-BR corpus that contains messages about the technology area. We obtained better results using the Comprehensive Measurement Feature Selection method and the Sequential Machine Optimization algorithm as the classiﬁer. We achieved considerable accuracy when we included the polarities of words in the vector space model of tweets. Keywords: Subjectivity classiﬁcation processing

Sentiment analysis Natural language

1 Introduction In the past decade, people have used social web to express and share their ‘sentiments’ about products and services. Texts published in social media (e.g., Twitter, Facebook, forums, blogs, and user forums) have become important sources of information for organizations. The analysis of these snippets of text is a way of monitoring the opinion and response from the clients of these organizations [1]. The area of research that automatically performs this processing is known as Sentiment Analysis or Opinion Mining. In this area, textual information can be categorized into two main types: facts and opinions. Opinions, unlike facts, describe people’s sentiments, appraisals, or feelings toward entities, events, and their properties. The task of deﬁning whether a sentence expresses an opinion or a fact can be treated as a classiﬁcation problem. This task is called subjectivity classiﬁcation [2]. The subjectivity classiﬁcation is a stage that precedes the Opinion Mining. When used, it improves Opinion Mining performance by preventing noisy and irrelevant extraction [3, 4]. In approaches that use machine learning algorithms for polarity classiﬁcation, the improved results can be attributed to the balancing of training sets. Some authors mention that the imbalance of such approaches is caused by the class of objective sentences, which usually has a larger number of samples [5]. © Springer International Publishing Switzerland 2016 J. Silva et al. (Eds.): PROPOR 2016, LNAI 9727, pp. 86–94, 2016. DOI: 10.1007/978-3-319-41552-9_8

Comparing Approaches to Subjectivity Classiﬁcation

87

In this paper, we compare two traditional approaches to subjectivity classiﬁcation: based on lexicon, and machine learning algorithms [10]. We tested SentiLex and WordAffectBR lexicons, and Sequential Machine Optimization (SMO) and Naive Bayes algorithms to determine the subjectivity of the tweets from Computer-BR corpus, we that built for this study. This corpus is composed of messages in Portuguese language about the technology area. We categorized the tweets according to their sentiment orientations (polarities). We considered as subjective those sentences with positive or negative polarity, and as objective sentences the remaining ones. In the approach using machine learning, we tested a new method for feature selection: the Compr

Data Loading...

Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets

Recommend Documents

Classification of Disaster-Related Tweets Using Supervised Learning: A Case Study on Cyclonic Storm FANI

Health-Related Tweets Classification: A Survey

Correction to: Semi-Supervised Sentiment Analysis of Portuguese Tweets with Random Walk in Feature Sample Networks

Towards a Classification Framework for Approaches to Enterprise Architecture Analysis

Polarizing Tweets on Climate Change

Statistical Classification: Optimization Approaches

IMC: A Classification of Identity Management Approaches

Dysphagia after anterior cervical discectomy and fusion: a prospective study comparing two anterior surgical approaches

Semi-Supervised Sentiment Analysis of Portuguese Tweets with Random Walk in Feature Sample Networks

Comparing Probabilistic and Logic Programming Approaches to Predict the Effects of Enzymes in a Neurodegenerative Diseas

Emotional Level Classification and Prediction of Tweets in Twitter

A Study of Machine Learning Approaches to Detect Cyberbullying