Lifelong Learning for Cross-Domain Vietnamese Sentiment Classification
This paper proposes an improvement to lifelong learning for cross-domain sentiment classification. Lifelong learning is to retain knowledge from past learning tasks to improve the learning task on a new domain. In this paper, we will discuss how bigram an
- PDF / 242,089 Bytes
- 11 Pages / 439.37 x 666.142 pts Page_size
- 27 Downloads / 191 Views
Abstract. This paper proposes an improvement to lifelong learning for cross-domain sentiment classification. Lifelong learning is to retain knowledge from past learning tasks to improve the learning task on a new domain. In this paper, we will discuss how bigram and bag-of-bigram features integrated into a lifelong learning system can help improve the performance of sentiment classification on both Vietnamese and English. Also, pre-processing techniques specifically for our cross-domain, Vietnamese dataset will be discussed. Experimental results show that our method achieves improvements over prior systems and its potential for cross-domain sentiment classification. Keywords: Sentiment classification learning · Lifelong learning
1
·
Vietnamese
·
Supervised
Introduction
The rapid growth of e-commerce and the Web age quickly makes the sentiment knowledge become an advantage to contribute more values to market predictions. Sentiment analysis remains a popular topic for research and developing sentiment-aware applications [1]. Sentiment classification, which is a subproblem of sentiment analysis task, is the task of classifying whether an evaluative text is expressing a positive, negative or neutral sentiment. In this paper, we focus on document-level binary sentiment classification, in which the sentiment is either positive or negative. In recent years, most studies on sentiment classification adopt machine learning and statistical approaches [2]. Such approaches hardly perform well on reallife data, which contains opinionated documents from domains different from the domain used to train the classifier. To overcome this limitation, lifelong learning [3], transfer learning [4], self-taught learning [5] and other domain adaptation techniques [4] were proposed. All mentioned methods is to transfer the knowledge gained from source domains to improve the learning task on the target domain. Chen et al. [3] proposed a novel approach of lifelong learning for sentiment classification, which is based on Na¨ıve Bayesian framework and stochastic gradient descent. Although this approach could deal with cross-domain sentiment c Springer International Publishing Switzerland 2016 H.T. Nguyen and V. Snasel (Eds.): CSoNet 2016, LNCS 9795, pp. 298–308, 2016. DOI: 10.1007/978-3-319-42345-6 26
Lifelong Learning for Cross-Domain Vietnamese Sentiment Classification
299
classification, it used the “bag-of-words” model and faces difficulties when represent the relationship between words. For example, the phrase “have to”, which is a common phrase in the negative text (but much less important in positive text), cannot be taken advantage of with bag-of-words feature. This is especially true in isolated languages, such as Vietnamese, where words are not separated by white spaces. As a resource-poor language, Vietnamese has quite a few accomplishments in the field of sentiment classification. To the best of our knowledge, there is no study on Vietnamese cross-domain sentiment classification. There is also no suitable dataset with a reasonable a
Data Loading...