Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application

PDF / 1,173,700 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
48 Downloads / 342 Views

Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application Hadeel N. Alshaer 1 & Mohammed A. Otair 1 & Laith Abualigah 1 & Mohammad Alshinwan 1 & Ahmad M. Khasawneh 1 Received: 6 June 2020 / Revised: 31 August 2020 / Accepted: 13 October 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Text classification could be defined as the way of allocating text into predefined groups according to its contents. Over the past few years, an increase emerged in the volume of information in the varied fields on the Internet, thus making the classification of texts one of the most important, yet challenging. Text classification is commonly employed in numerous applications and for different objectives. The extensive and broad use of the Internet, particularly in the Arab world, as well as the massive number of the documents and pages which are provided in the Arabic language, raised the need for having suitable tools for classification of these pages and documents by their main categories. The aim of this paper to study the effect of the improved CHI (ImpCHI) Square on the performance of six well-known classifiers: Random Forest, Decision Tree, Naïve Bayes, Naïve Bayes Multinomial, Bayes Net, and Artificial Neural Networks. These proposed techniques are quite important for improving classification of Arabic documents and can be regarded as a promising basis for the stage of text classification because it contributes to the classification of the texts into predefined categories. This combination method takes the advantages of more than one technique, which can produce better results in the final outcomes. The dataset employed in this paper includes 9055 Arabic documents that were collected from various Arabic resources. Based on their content, these documents were divided into twelve categories. Four performance evaluation criteria were used: the F-measure, recall, precision, and Time build model. The experimental results show that the use of ImpCHI square gives better classification results than the normal CHI square method with all studied classifiers, in terms of all used performance criteria. Keywords Text classification algorithms . Bayes net . Naïve Bayes . Random Forest . Decision tree . Artificial neural networks . CHI Square

* Laith Abualigah [email protected] Extended author information available on the last page of the article

Multimedia Tools and Applications

1 Introduction Information Retrieval (IR) is a field of computer science of great importance in our time because of the increasing volume of information. This information may need to be arranged and classified so that it can be easily retrieved. Text classification (TC) a process that has been emerged importantly in various fields, especially in areas on the Internet. Text mining is a textual analysis of data in natural language text and seeks to extract useful information from textual data. Besides, text mining helps organizations extract valuable ideas from document content.

Data Loading...

Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application

Recommend Documents

Feature Selection Method Based on Chi-Square Test and Minimum Redundancy

Chi-Square Test and Its Application

Functionality-Improved Arabic Text Steganography Based on Unicode Features

Detecting Anomalies in Production Quality Data Using a Method Based on the Chi-Square Test Statistic

Network Service Analysis Based on Feature Selection Using Improved Linear Mixed Model

Binary Text Representation for Feature Selection

Text Classification Using K-Nearest Neighbor Algorithm and Firefly Algorithm for Text Feature Selection

Automatic Arabic Text Summarization Using Analogical Proportions

Short Text Feature Extension Based on Improved Frequent Term Sets

Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset

Arabic text summarization using deep learning approach

Fast and Straightforward Feature Selection Method