Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

  • PDF / 4,350,849 Bytes
  • 60 Pages / 595.276 x 790.866 pts Page_size
  • 27 Downloads / 258 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

REVIEW

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature Absalom E. Ezugwu1 • Amit K. Shukla2 • Moyinoluwa B. Agbaje1 • Olaide N. Oyelade3 Ada´n Jose´-Garcı´a4 • Jeffery O. Agushaka1



Received: 5 August 2020 / Accepted: 24 September 2020  Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Cluster analysis is an essential tool in data mining. Several clustering algorithms have been proposed and implemented, most of which are able to find good quality clustering results. However, the majority of the traditional clustering algorithms, such as the K-means, K-medoids, and Chameleon, still depend on being provided a priori with the number of clusters and may struggle to deal with problems where the number of clusters is unknown. This lack of vital information may impose some additional computational burdens or requirements on the relevant clustering algorithms. In real-world data clustering analysis problems, the number of clusters in data objects cannot easily be preidentified and so determining the optimal amount of clusters for a dataset of high density and dimensionality is quite a difficult task. Therefore, sophisticated automatic clustering techniques are indispensable because of their flexibility and effectiveness. This paper presents a systematic taxonomical overview and bibliometric analysis of the trends and progress in nature-inspired metaheuristic clustering approaches from the early attempts in the 1990s until today’s novel solutions. Finally, key issues with the formulation of metaheuristic algorithms as a clustering problem and major application areas are also covered in this paper. Keywords Clustering algorithm  Automatic clustering  Taxonomy  Metaheuristic  Bibliometric analysis

1 Introduction Data collection by any process, in any form and anyhow has, over the years, become a major necessity for the extraction of meaningful and tangible information across all domains, but especially in research, computer science, natural science, and engineering. However, without proper and definite analysis, these acquired data become meaningless and irrelevant. Because most of these data are in various arbitrary & Absalom E. Ezugwu [email protected] 1

School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa

2

South Asian University, New Delhi 110021, India

3

Department of Computer Science, Ahmadu Bello University, Zaria, Nigeria

4

College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK

forms, their grouping might be difficult due to the lack of prior knowledge about the data object features. Without such grouping or classification, the data can become instances of unsupervised learning. Data clustering analysis, especially in the field of data mining, has over time played a vital role in organizing and classifying data appropriately. Simply put, data clustering or clustering an