An effective multi-level synchronization clustering method based on a linear weighted Vicsek model

  • PDF / 3,047,774 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 41 Downloads / 173 Views

DOWNLOAD

REPORT


An effective multi-level synchronization clustering method based on a linear weighted Vicsek model Xinquan Chen 1

&

Yirou Qiu 2

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract To conquer the shortcoming that general clustering methods cannot process big data in the main memory, this paper presents an effective multi-level synchronization clustering (MLSynC) method by using a framework of “divide and collect” and a linear weighted Vicsek model. We also introduce two concrete implementations of MLSynC method, a two-level framework algorithm and a recursive algorithm. MLSynC method has a different process with SynC algorithm, ESynC algorithm and SSynC algorithm. By the theoretic analysis, we find the time complexity of MLSynC method is less than SSynC. Simulation and experimental study on multi-kinds of data sets validate that MLSynC method not only gets better local synchronization effect but also needs less iterative times and time cost than SynC algorithm. Moreover, we observe that MLSynC method not only needs less time cost than ESynC and SSynC, but also almost gets the same local synchronization effect as ESynC and SSynC if the partition of the data set is proper. Further comparison experiments with some classical clustering algorithms demonstrate the clustering effect of MLSynC method. Keywords Divide and collect . Kuramoto model . Shrinking synchronization clustering . Linear weighted Vicsek model . Near neighbor point set

1 Introduction Clustering is an unsupervised learning method that tries to find some obvious distribution structures and patterns in unlabeled data sets by maximizing the similarity of the objects in a common cluster and minimizing the similarity of the objects in different clusters [1]. Clustering has been used in many areas such as machine learning, pattern recognition, image processing, marketing and customer analysis, agriculture, security and crime detection, information retrieval, and

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10489-020-01767-4) contains supplementary material, which is available to authorized users. * Xinquan Chen [email protected] Yirou Qiu [email protected] 1

School of Computer & Information, Anhui Polytechnic University, Wuhu 241000, China

2

School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China

bioinformatics. Cluster is often one important step in the process of data analysis. Clustering algorithms have been studied for decades. There have been hundreds of clustering algorithms until now, but none of them is all-purpose. Almost all clustering algorithms have flaws. Some clustering algorithms are suitable for dealing with data with certain types, and others are suitable for handling data with special distribution structures. Many real data have complex distributions, diversiform types, great capacity, noises, or isolates. So there is a continuous demand for researching different kinds of clustering methods. To obtain better clustering results i