A parallel hybrid krill herd algorithm for feature selection

PDF / 1,730,733 Bytes
24 Pages / 595.276 x 790.866 pts Page_size
70 Downloads / 417 Views

ORIGINAL ARTICLE

A parallel hybrid krill herd algorithm for feature selection Laith Abualigah1 · Bisan Alsalibi2 · Mohammad Shehab3 · Mohammad Alshinwan1 · Ahmad M. Khasawneh1 · Hamzeh Alabool4 Received: 12 December 2019 / Accepted: 18 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In this paper, a novel feature selection method is introduced to tackle the problem of high-dimensional features in the text clustering application. Text clustering is a prevailing direction in big text mining; in this manner, documents are grouped into cohesive groups by using neatly selected informative features. Swarm-based optimization techniques have been widely used to select the relevant text features and shown promising results on multi-sized datasets. The performance of traditional optimization algorithms tends to fail miserably when using large-scale datasets. A novel parallel membrane-inspired framework is proposed to enhance the performance of the krill herd algorithm combined with the swap mutation strategy (MHKHA). In which the krill herd algorithm is hybridized the swap mutation strategy and incorporated within the parallel membrane framework. Finally, the k-means technique is employed based on the results of feature selection-based Krill Herd Algorithm to cluster the documents. Seven benchmark datasets of various characterizations are used. The results revealed that the proposed MHKHA produced superior results compared to other optimization methods. This paper presents an alternative method for the text mining community through cohesive and informative features. Keywords Feature selection · Document clustering · Parallel membrane computing · Krill herd algorithm · Local search · Optimization problem

1 Introduction The fast evolvement of Internet technology and the explosive growth of online text document content have raised the challenge of dealing with the dynamic size of text documents (TD) [1]. Text clustering (TC) is among the utmost promising unsupervised learning techniques utilized for categorizing (clustering) huge texts toward a subset of analogous clusters [2, 3]. Sharing a high level of homogeneity documents are grouped in one cluster in which different clusters exhibit a high level of heterogeneity of information (documents) [4]. * Laith Abualigah [email protected] 1

Faculty of Computer Sciences and Informatics, Amman Arab University, Amman 11953, Jordan

2

School of Computer Sciences, Universiti Sains Malaysia, Pinang, Malaysia

3

Computer Science Department, Aqaba University of Technology, Aqaba, Jordan

4

College of Computing and Informatics, Saudi Electronic University, Abha, Saudi Arabia

A frequent dominant pattern, called vector space model (VSM), is utilized in the text mining domain such as text classification, summarization, clustering, and feature selection [5, 6]. VSM describes the components of any document as a vector of features with its weighting scores. In this pattern, the weighting score is assigned for each feature according to its

Data Loading...

A parallel hybrid krill herd algorithm for feature selection

Recommend Documents

Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Feature Selection Optimization Using a Hybrid Genetic Algorithm

Evolutionary Hybrid Feature Selection for Cancer Diagnosis

A new hybrid stability measure for feature selection

A GA-Based Feature Selection Algorithm for Remote Sensing Images

A hybrid feature selection scheme for mixed attributes data

Hybrid Ranking and Regression for Algorithm Selection

Extreme Algorithm Selection with Dyadic Feature Representation

Application of hybrid forecast engine based intelligent algorithm and feature selection for wind signal prediction

A Fuzzy Krill Herd Approach for Structural Health Monitoring of Bridges using Operational Modal Analysis

Feature Selection for Vocal Segmentation Using Social Emotional Optimization Algorithm