A parallel hybrid krill herd algorithm for feature selection
- PDF / 1,730,733 Bytes
- 24 Pages / 595.276 x 790.866 pts Page_size
- 70 Downloads / 270 Views
ORIGINAL ARTICLE
A parallel hybrid krill herd algorithm for feature selection Laith Abualigah1 · Bisan Alsalibi2 · Mohammad Shehab3 · Mohammad Alshinwan1 · Ahmad M. Khasawneh1 · Hamzeh Alabool4 Received: 12 December 2019 / Accepted: 18 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract In this paper, a novel feature selection method is introduced to tackle the problem of high-dimensional features in the text clustering application. Text clustering is a prevailing direction in big text mining; in this manner, documents are grouped into cohesive groups by using neatly selected informative features. Swarm-based optimization techniques have been widely used to select the relevant text features and shown promising results on multi-sized datasets. The performance of traditional optimization algorithms tends to fail miserably when using large-scale datasets. A novel parallel membrane-inspired framework is proposed to enhance the performance of the krill herd algorithm combined with the swap mutation strategy (MHKHA). In which the krill herd algorithm is hybridized the swap mutation strategy and incorporated within the parallel membrane framework. Finally, the k-means technique is employed based on the results of feature selection-based Krill Herd Algorithm to cluster the documents. Seven benchmark datasets of various characterizations are used. The results revealed that the proposed MHKHA produced superior results compared to other optimization methods. This paper presents an alternative method for the text mining community through cohesive and informative features. Keywords Feature selection · Document clustering · Parallel membrane computing · Krill herd algorithm · Local search · Optimization problem
1 Introduction The fast evolvement of Internet technology and the explosive growth of online text document content have raised the challenge of dealing with the dynamic size of text documents (TD) [1]. Text clustering (TC) is among the utmost promising unsupervised learning techniques utilized for categorizing (clustering) huge texts toward a subset of analogous clusters [2, 3]. Sharing a high level of homogeneity documents are grouped in one cluster in which different clusters exhibit a high level of heterogeneity of information (documents) [4]. * Laith Abualigah [email protected] 1
Faculty of Computer Sciences and Informatics, Amman Arab University, Amman 11953, Jordan
2
School of Computer Sciences, Universiti Sains Malaysia, Pinang, Malaysia
3
Computer Science Department, Aqaba University of Technology, Aqaba, Jordan
4
College of Computing and Informatics, Saudi Electronic University, Abha, Saudi Arabia
A frequent dominant pattern, called vector space model (VSM), is utilized in the text mining domain such as text classification, summarization, clustering, and feature selection [5, 6]. VSM describes the components of any document as a vector of features with its weighting scores. In this pattern, the weighting score is assigned for each feature according to its
Data Loading...