Combining weighted category-aware contextual information in convolutional neural networks for text classification

  • PDF / 1,206,799 Bytes
  • 20 Pages / 439.642 x 666.49 pts Page_size
  • 4 Downloads / 169 Views

DOWNLOAD

REPORT


Combining weighted category-aware contextual information in convolutional neural networks for text classification Xin Wu1 · Yi Cai1 · Qing Li2 · Jingyun Xu1 · Ho-fung Leung3 Received: 26 March 2019 / Revised: 15 July 2019 / Accepted: 28 October 2019 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Convolutional neural networks (CNNs) are widely used in many natural language processing tasks, which employ some convolutional filters to capture useful semantic features of a text. However, a small window size convolutional filter is short of the ability to capture contextual information, simply increasing the window size may bring the problems of data sparsity and enormous parameters. To capture the contextual information, we propose to use the weighted sum operation to obtain contextual word representation. We present one implicit weighting method and two explicit category-aware weighting methods to assign the weights of the contextual information. Experimental results on five text classification datasets show the effectiveness of our proposed methods. Keywords Convolutional neural networks · Text classification · Contextual information · Word representation

1 Introduction Text classification is an essential task in many natural language processing (NLP) applications, such as Web searching, sentiment analysis, and information filtering [1]. Traditional text classification methods mainly focus on human designed features and different types of machine learning algorithms [43]. The most widely used feature is the bag-of-words (BoW) feature, which represents a word as a high-dimension and sparse vector with only This article belongs to the Topical Collection: Special Issue on Web Information Systems Engineering 2018 Guest Editors: Hakim Hacid, Wojciech Cellary, Hua Wang and Yanchun Zhang  Yi Cai

[email protected] 1

School of Software Engineering, South China University of Technology, Guangzhou, China

2

Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China

3

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China

World Wide Web

one non-zero value, resulting in the poor performance of representing the semantic and syntax of texts. More complex features such as POS tagging and tree kernel [31] are designed to capture more semantic features. Classifiers such as support vector machine and logical regression can be used for classification with these features [43]. However, such handcrafted features are time-consuming due to the extensive feature engineering. Recently, deep neural networks demonstrate their effectiveness of automatic feature extraction for text classification. The commonly used neural networks for text classification include Recursive Neural Networks (RecursiveNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and more complex networks architecture. One influential work is proposed by Kim [12], where one single convolution layer is employed to capture semantic features. Despi