Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)
- PDF / 1,256,038 Bytes
- 9 Pages / 595.276 x 790.866 pts Page_size
- 13 Downloads / 235 Views
FOCUS
Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN) Bichitrananda Behera1
· G. Kumaravelan1
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract The fuzzy rough set (FRS) acts as a powerful mathematical tool to deal with uncertain data, and it has many applications in feature selection, dimensionality reduction and classification. The fuzzy rough set based on robust nearest neighbor (FRSRNN) is one of the vital classifiers which has been successfully applied to handle real-valued datasets. From the literature, it is very clearly evident that no research attempt has been made on FRS-RNN to text document classification. Generally, the document classification process consists of two crucial phases, namely feature extraction and classifier model construction. Mainly TF-IDF and convolutional neural network (CNN)-based techniques are used for efficient feature extraction. The CNN provides the best feature engineering through effective preprocessing the documents for better representation using pre-trained word embedding. In this paper, we proposed a modified CNN structure for both text document classification and feature extraction. Then, both FRS and FRS-RNN have been implemented for text document classification on the benchmark datasets like 20 Newsgroup and Reuter-21578 using both TF-IDF and modified CNN-based feature extraction techniques. The classification performance of the FRS, CNN and FRS-RNN is evaluated and compared using well-defined metrics like accuracy, precision, recall and F1-measure. Finally, the classification performance of FRS-RNN is compared with state-ofthe-art traditional classification models such as SVM, KNN, Naïve Bayes, DNN, CNN and RNN and with some recently developed classification models. The experimental results followed by empirical evaluation show that the proposed FRS-RNN outperforms all the aforementioned classification models. Keywords Document classification · Convolutional neural network · Fuzzy rough set · Text mining
1 Introduction The text document classification is a forefront research area of natural language processing due to rapidly rising of largescale text documents from science, engineering, medical, business and social media. The text document classification process involves text feature extraction and text document classification as the two significant steps. The text feature extraction process first extracts text features from the raw text documents using text preprocessing and then converts the features into feature vectors which is suitable for text Communicated by Kannan.
B
Bichitrananda Behera [email protected] G. Kumaravelan [email protected]
1
Department of Computer Science, Pondicherry University, Pondicherry, India
classification. On the other hand, the text document classification process involves assigning class labels to the text documents of the test dataset based on the classifier model developed from the text documents of training dataset using the supervised machine learni
Data Loading...