Sentiment Classification Using Supervised Sub-Spacing

An important application domain for Machine learning is sentiment classification. Here, the traditional approach is to represent documents using a Bag-Of-Words (BOW) model, where individual terms are used as features. However, the BOW model is unable to s

  • PDF / 365,755 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 47 Downloads / 230 Views

DOWNLOAD

REPORT


Abstract An important application domain for Machine learning is sentiment classification. Here, the traditional approach is to represent documents using a BagOf-Words (BOW) model, where individual terms are used as features. However, the BOW model is unable to sufficiently model the variation inherent in natural language text. Term-relatedness metrics are commonly used to overcome this limitation by capturing latent semantic concepts or topics in documents. However, representations produced using standard term relatedness approaches do not take into account class membership of documents. In this work, we present a novel approach called Supervised Sub-Spacing (S3) for introducing supervision to term-relatedness extraction. S3 works by creating a separate sub-space for each class within which term relations are extracted such that documents belonging to the same class are made more similar to one another. Recent approaches in sentiment classification have proposed combining machine learning with background knowledge from sentiment lexicons for improved performance. Thus, we present a simple, yet effective approach for augmenting S3 with background knowledge from SentiWordNet. Evaluation shows S3 to significantly out perform the state-of-the-art SVM classifier. Results also show that using background knowledge from SentiWordNet significantly improves the performance of S3.

S. Sani (B) · N. Wiratunga · S. Massie · R. Lothian Robert Gordon University, Aberdeen, Scotland e-mail: [email protected] N. Wiratunga e-mail: [email protected] S. Massie e-mail: [email protected] R. Lothian e-mail: [email protected] M. Bramer and M. Petridis (eds.), Research and Development in Intelligent Systems XXX, 109 DOI: 10.1007/978-3-319-02621-3_8, © Springer International Publishing Switzerland 2013

110

S. Sani et al.

1 Introduction Sentiment classification has many important applications e.g. market analysis and product recommendation. A common approach is to apply machine learning algorithms in order to classify individual texts (generally called documents) into the appropriate sentiment category (e.g. positive or negative) [15]. Doing this however requires text documents to be represented using a suitable set of features. A popular approach is the Bag-Of-Words (BOW) model where documents are represented using individual terms as features. However, the BOW model is unable to cope with variation in natural language vocabulary (e.g. synonymy and polysemy) which often requires semantic indexing approaches [18]. The general idea of semantic indexing is to discover terms that are semantically related and use this knowledge to identify conceptual similarity even in the presence of vocabulary variation. The result is the generalisation of document representations away from low-level expressions to high-level semantic concepts. Several techniques have been proposed for transforming document representations from the space of individual terms to that of latent semantic concepts. Examples include Latent Semantic Indexing (LSI) which