An evaluation of k-means as a local search operator in hybrid memetic group search optimization for data clustering
- PDF / 2,902,292 Bytes
- 26 Pages / 595.276 x 790.866 pts Page_size
- 9 Downloads / 183 Views
(0123456789().,-volV)(0123456789().,-volV)
An evaluation of k-means as a local search operator in hybrid memetic group search optimization for data clustering Luciano D. S. Pacifico1
•
Teresa B. Ludermir2
Accepted: 17 September 2020 Ó Springer Nature B.V. 2020
Abstract Cluster analysis is one important field in pattern recognition and machine learning, consisting in an attempt to distribute a set of data patterns into groups, considering only the inner properties of those data. One of the most popular techniques for data clustering is the K-Means algorithm, due to its simplicity and easy implementation. But K-Means is strongly dependent on the initial point of the search, what may lead to suboptima (local optima) solutions. In the past few decades, Evolutionary Algorithms (EAs), like Group Search Optimization (GSO), have been adapted to the context of cluster analysis, given their global search capabilities and flexibility to deal with hard optimization problems. However, given their stochastic nature, EAs may be slower to converge in comparison to traditional clustering models (like K-Means). In this work, three hybrid memetic approaches between K-Means and GSO are presented, named FMKGSO, MKGSO and TMKGSO, in such a way that the global search capabilities of GSO are combined with the fast local search performances of K-Means. The degree of influence of K-Means on the behavior of GSO method is evaluated by a set of experiments considering both real-world problems and synthetic data sets, using five clustering metrics to access how good and robust the proposed hybrid memetic models are. Keywords Data clustering Evolutionary algorithms Group search optimization K-means
1 Introduction In the past few decades, the amount of daily produced data in electronic devices, such as smartphones, tablets, computers, cars, GPS, smart TVs, Internet of Things applications, and so on, has increased exponentially, in such a way that automatic and scalable computational systems are even more required. To extract useful information from the large data sets such systems are based on, it is impossible to rely in human analysis only, once the need for precise and reliable information in a short period of time has become mandatory (Naldi and Campello 2014). & Luciano D. S. Pacifico [email protected] Teresa B. Ludermir [email protected] 1
Departamento de Computac¸a˜o (DC), Universidade Federal Rural de Pernambuco (UFRPE), Recife, PE, Brazil
2
Centro de Informa´tica (CIn) Universidade Federal de Pernambuco (UFPE), Recife, PE, Brazil
Data clustering is one of the most important and primitive activities in pattern recognition, consisting in an important mechanism for exploratory data analysis. Clustering is characterized by an unsupervised attempt to categorize a set of data patterns in clusters, in such a way that observations belonging in a same cluster are more close related (according to their feature set) than observations from different clusters. In clustering, no prior knowledge about the data set at hand is requir
Data Loading...