Orthographic features for emotion classification in Chinese in informal short texts

  • PDF / 385,489 Bytes
  • 24 Pages / 439.37 x 666.142 pts Page_size
  • 49 Downloads / 179 Views

DOWNLOAD

REPORT


Orthographic features for emotion classification in Chinese in informal short texts I-Hsuan Chen1 · Yunfei Long2,3 Qin Lu4 · Chu-Ren Huang1

·

Accepted: 26 October 2020 © Springer Nature B.V. 2020

Abstract Informal short texts on the web are rich in emotions as they often reflect unfiltered immediate reactions to breaking news events. The emotion density, however, stands in contrast to its poverty of linguistic contexts and features for emotion classification. This paper tackles that challenge by proposing orthographic features based on orthographic code mixing and code-switching for both non-ML and ML approaches. Our results show that orthographic features routinely outperform grammatical features for emotion classification for short texts in all approaches as expected. Orthographic features were also shown to make more significant contributions, especially in terms of precision and in formal texts when state of the art deep learning algorithms are applied. This result confirms the effectiveness of the orthographic change feature to the task of emotion classification. These results are argued to be applicable to all languages because of the common code-shifting in languages with non-Latin orthographies, and the use of non-letter symbols in all languages. & Chu-Ren Huang [email protected] I-Hsuan Chen [email protected] Yunfei Long [email protected] Qin Lu [email protected] 1

Chinese and Bilingual Studies, Hong Kong Polytechnic University, Hung Hom, China

2

Department of Computing, Hong Kong Polytechnic University, Hung Hom, China

3

School of Computer Science and Electronic Engineering, University of Essex, Hung Hom, China

4

Department of Computing, Hong Kong Polytechnic University, Hung Hom, China

123

I.-H. Chen et al.

Keywords Orthography · Emotion classification · Orthographic code mixing · Code-switching · Short text · Orthographic features · Morpho-syntactic features

1 Introduction Emotion classification is one of the most challenging and widely applicable Natural Language Processing (NLP) tasks. Past studies have taken advantage of a wide variety of linguistic features such as semantic, syntactic, and cognitive properties (Barbosa and Feng 2010; Balamurali et al. 2011; Joshi et al. 2014; Mishra et al. 2017; Liu and Lei 2012; Nakov et al. 2013). Such studies typically rely on linguistic cues in context, such as emotion words, syntactic, and pragmatic contexts, or semantic orientation so as to model the task of emotion and sentiment classification. In this task, emotion and sentiment classes are typically assigned to a short paragraph or a sentence instead of a word, with emotion and/or sentiment annotated corpora as training data. Early applications were product review classification (Rudra et al. 2016; Hartmann et al. 2018; Joshi et al. 2016). In the past years, focus of emotion and sentiment analysis has shifted to social media such as microblogs and tweets (Bollen et al. 2011; Ghosh et al. 2015; Rosenthal et al. 2015). Such data consists of short texts and sentence fragments in