Clustering Algorithm of Density Difference Optimized by Mixed Teaching and Learning

  • PDF / 2,153,133 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 96 Downloads / 148 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Clustering Algorithm of Density Difference Optimized by Mixed Teaching and Learning Hailong Chen1   · Miaomiao Ge1 · Yutong Xue1 Received: 9 February 2020 / Accepted: 27 April 2020 © The Author(s) 2020

Abstract Density peak clustering (DPC) algorithm is to find clustering centers by calculating the local density and distance of data points based on the distance between data points and the cutoff distance (dc) set manually. Generally, the attribute calculation between data points is simply obtained by Euclidean distance. However, when the density distribution of data points in data sets is uneven, there are high-density and low-density points, and the d­ c value is set artificially and randomly, this will seriously affect the clustering results of DPC algorithm. For this reason, a clustering algorithm which combines teaching and learning optimization algorithm and density gap is proposed (NSTLBO-DGDPC). First, in order to consider the influence of data point attributes and neighborhoods, the density difference distance is introduced to replace the Euclidean distance of the original algorithm. Secondly, because manual selection of clustering centers may produce incorrect clustering results, the standard deviation of high-density distance is used to determine the clustering centers of clustering algorithm. Finally, using the teaching and learning optimization algorithm (TLBO) to find the optimal value, in order to avoid the algorithm falling into local optimum. When the population density reaches a certain threshold, the niche selection strategy is introduced to discharge the similarity value, and then the nonlinear decreasing strategy is used to update the students in the teaching stage and the learning stage to obtain the optimal dc solution. In this paper, the accuracy and convergence of the improved TLBO algorithm (NSTLBO) are verified by ten benchmark functions. Simulation experiments show that the NSTLBO algorithm has better performance. Clustering algorithm integrating teaching and learning optimization algorithm and density gap proposed in this paper are validated by using eight synthetic data sets and eight real data sets. The simulation results show that the algorithm has better clustering quality and effect. Keywords  Clustering · Density gap · Niche · Nonlinear decreasing strategy · TLBO

Introduction With the advent of the era of big data, it is becoming more and more important to obtain valuable and potential knowledge and information from massive data. Cluster analysis is a multivariate statistical method that is divided into groups. According to the degree of similarity between each abstract object, it is divided into several groups, and similar objects are combined into one set. Clustering [1] is a process in which each data point in a data set is aggregated to several centers of the same feature. That is, the process of dividing a * Hailong Chen [email protected] 1



Department of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China