Chameleon algorithm based on mutual k-nearest neighbors

  • PDF / 2,216,815 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 46 Downloads / 182 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

Chameleon algorithm based on mutual k-nearest neighbors Yuru Zhang 1 & Shifei Ding 1,2 & Lijuan Wang 1,3 & Yanru Wang 1 & Ling Ding 1 Accepted: 2 September 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Clustering is a typical unsupervised data analysis method, which divides a given data set without label information into multiple clusters. The data on each cluster has a great deal of association, which can be used as the preprocessing stage of other algorithms or for further association analysis. Therefore, clustering plays an important role in a wide range of fields. Chameleon is a clustering algorithm that combines the relative interconnectivity and relative closeness to find clusters of arbitrary shape with high quality. However, the graph-partitioning technology hMETIS algorithm used in the algorithm is difficult to operate and easy to cause uncertainty of results. In addition, the final number of clusters need to be specified by user as a parameter to stop merging, which is difficult to determine without prior information. Aiming at these shortcomings, Chameleon algorithm based on mutual k-nearest neighbors (MChameleon) is proposed. Firstly, the idea of mutual k-nearest neighbors is introduced to directly generate sub-clusters, which omits the process of partitioning graph. Then, the concept of MC modularity is introduced, which is used to objectively identify the final clustering results. By experiments on artificial data sets and UCI data sets, we compared MChameleon with the original Chameleon algorithm, the improved AChameleon algorithm and the classic K-Means, DBSCAN, BIRCH algorithm in accuracy. Experimental results on data sets show that Chameleon algorithm based on mutual k-nearest neighbors has great advantages and is feasible. Keywords Clustering analysis . Mutual k-nearest neighbors . Chameleon algorithm . Modularity

1 Introduction In today’s rapidly developing era of computer technology, a large number of complex types of data emerge. In order to discover the hidden information from these data, data mining technologies have been studied in statistics, databases, machine learning and other aspects [1]. Cluster analysis is an important means in data mining. From the perspective of data mining, its task is to analyze the characteristics of data sets and provide technical methods for probing meaningful and practical worth knowledge [2, 3]. The main idea of the clustering algorithm is to divide a given population into groups or

* Shifei Ding [email protected] 1

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, Jiangsu, China

2

Mine Digitization Engineering Research Center of Ministry of Education of the People’s Republic of China, Xuzhou 221116, China

3

School of Information and Electrical Engineering, Xuzhou College of Industrial Technology, Xuzhou 221400, China

clusters according to the similarity between the objects. On the one hand, the relevant objects is divided into one group as much a