Constructing Context-Aware Sentiment Lexicons with an Asynchronous Game with a Purpose

One of the reasons sentiment lexicons do not reach human-level performance is that they lack the contexts that define the polarities of words. While obtaining this knowledge through machine learning would require huge amounts of data, context is commonsen

  • PDF / 421,420 Bytes
  • 13 Pages / 439.363 x 666.131 pts Page_size
  • 8 Downloads / 181 Views

DOWNLOAD

REPORT


bstract. One of the reasons sentiment lexicons do not reach human-level performance is that they lack the contexts that define the polarities of words. While obtaining this knowledge through machine learning would require huge amounts of data, context is commonsense knowledge for people, so human computation is a better choice. We identify context using a game with a purpose that increases the workers’ engagement in this complex task. With the contextual knowledge we obtain from only a small set of answers, we already halve the sentiment lexicons’ performance gap relative to human performance.

1 Introduction Sentiment analysis identifies expressions of subjectivity in texts, such as sentiments or emotional states. We consider the sentiment classification task, which determines whether the sentiments expressed in a text are positive or negative. This task requires commonsense knowledge about the polarities of sentiment words. The relative ease of construction led early researchers in the field toward corpusbased sentiment classification [1–3]. These methods aggregate statistical, syntactic, and semantic relations between words. A significant downside is that the classifiers that result are efficient only on narrow domains. This may be the reason why the competing, lexicon-based approach is currently the backbone of sentiment classification. Several sentiment lexicons [4–6] have been available for a significant period of time. However, multiple lexicons continue to appear [7, 8], showing that a satisfying solution has not yet been found. The most successful methods perform syntactic preprocessing to extract relevant words, and then consider the resulting set of independent words as features of the text. Sentiment classification is performed on these features, by adding word polarity scores compiled in sentiment lexicons or learned with statistical methods. These models obtain from 60% to 80% accuracy [2, 1]. Better results can sometimes be achieved by training domain-specific classifiers, but only at the expense of narrow coverage. This performance is lower than that of people, who can extract sentiment with 80% to 90% agreement [9], depending on the domain of the texts. A reason why these classifiers cannot reach human-level performance is that the words’ polarities are influenced by context: a small hotel room is negative, while a small digital camera is positive. By representing texts as independent words, context A. Gelbukh (Ed.): CICLing 2014, Part II, LNCS 8404, pp. 32–44, 2014. c Springer-Verlag Berlin Heidelberg 2014 

Constructing Context-Aware Sentiment Lexicons

33

is ignored. In narrow domains, words mostly occur in a single context, thus high accuracy can be achieved. For broad domains, it is necessary to enrich the feature set with contexts, by including word combinations. However, the complexity of the resulting models would explode, and it would no longer be feasible to acquire them from data. Nevertheless, the polarity of most words has only a few exceptions, so the size of these models could be ma