Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)

PDF / 1,256,038 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
13 Downloads / 245 Views

FOCUS

Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN) Bichitrananda Behera1

· G. Kumaravelan1

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract The fuzzy rough set (FRS) acts as a powerful mathematical tool to deal with uncertain data, and it has many applications in feature selection, dimensionality reduction and classification. The fuzzy rough set based on robust nearest neighbor (FRSRNN) is one of the vital classifiers which has been successfully applied to handle real-valued datasets. From the literature, it is very clearly evident that no research attempt has been made on FRS-RNN to text document classification. Generally, the document classification process consists of two crucial phases, namely feature extraction and classifier model construction. Mainly TF-IDF and convolutional neural network (CNN)-based techniques are used for efficient feature extraction. The CNN provides the best feature engineering through effective preprocessing the documents for better representation using pre-trained word embedding. In this paper, we proposed a modified CNN structure for both text document classification and feature extraction. Then, both FRS and FRS-RNN have been implemented for text document classification on the benchmark datasets like 20 Newsgroup and Reuter-21578 using both TF-IDF and modified CNN-based feature extraction techniques. The classification performance of the FRS, CNN and FRS-RNN is evaluated and compared using well-defined metrics like accuracy, precision, recall and F1-measure. Finally, the classification performance of FRS-RNN is compared with state-ofthe-art traditional classification models such as SVM, KNN, Naïve Bayes, DNN, CNN and RNN and with some recently developed classification models. The experimental results followed by empirical evaluation show that the proposed FRS-RNN outperforms all the aforementioned classification models. Keywords Document classification · Convolutional neural network · Fuzzy rough set · Text mining

1 Introduction The text document classification is a forefront research area of natural language processing due to rapidly rising of largescale text documents from science, engineering, medical, business and social media. The text document classification process involves text feature extraction and text document classification as the two significant steps. The text feature extraction process first extracts text features from the raw text documents using text preprocessing and then converts the features into feature vectors which is suitable for text Communicated by Kannan.

B

Bichitrananda Behera [email protected] G. Kumaravelan [email protected]

1

Department of Computer Science, Pondicherry University, Pondicherry, India

classification. On the other hand, the text document classification process involves assigning class labels to the text documents of the test dataset based on the classifier model developed from the text documents of training dataset using the supervised machine learni

Data Loading...

Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)

Recommend Documents

Robust Earthquake Cluster Analysis Based on K-Nearest Neighbor Search

Text Classification Using K-Nearest Neighbor Algorithm and Firefly Algorithm for Text Feature Selection

Evolutionary Extreme Learning Machine Weighted Fuzzy-Rough Nearest-Neighbour Classification

Performance Analysis of Nearest Neighbor, K-Nearest Neighbor and Weighted K-Nearest Neighbor for the Classification of A

Hubness-based fuzzy measures for high-dimensional k -nearest neighbor classification

Towards enriching the quality of k -nearest neighbor rule for document classification

Nearest Neighbor

A novel classification algorithm based on kernelized fuzzy rough sets

Reverse Nearest Neighbor Search

Nearest Neighbor Query

Modified Soft Rough set for Multiclass Classification

A fuzzy rough set based fitting approach for fuzzy set-valued information system