A Clustering Algorithm for Triangular Fuzzy Normal Random Variables

  • PDF / 6,132,897 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 96 Downloads / 188 Views

DOWNLOAD

REPORT


A Clustering Algorithm for Triangular Fuzzy Normal Random Variables Ye Li1 • Yiyan Chen2 • Qun Li3

Received: 16 April 2019 / Revised: 29 June 2020 / Accepted: 1 August 2020 Ó Taiwan Fuzzy Systems Association 2020

Abstract In view of the fact that most clustering algorithms cannot solve the clustering problem about samples with uncertain information, according to the theory of fuzzy sets and probability, we define the fuzzy-probability binary measure space and triangular fuzzy normal random variables firstly, and then combine the advantages of kmeans algorithm, such as simple principle, few parameters, fast convergence rate, good clustering effect and good scalability, etc., a clustering algorithm is proposed for samples containing multiple triangular fuzzy normal random variables, which we call TFNRV-k-means algorithm. The algorithm uses our proposed Euclidean random comprehensive absolute distance (ERCAD for short) as a measurement, under the fuzzy measure, the lower bound, the principal value and the upper bound of the triangular fuzzy normal random variables are iterated, respectively, by means, and then the cluster center is updated until it becomes stable and unchanged. Then we analyze the time complexity of the proposed algorithm, and test the algorithm under different sample sets by random simulation experiments. We get the highest clustering accuracy of 99.00% and the maximum Kappa coefficient of 0.9850, and draw the conclusion that TFNRV-k-means clustering Ye Li, Yiyan Chen and Qun Li contributed equally to this work and should be considered co-first authors. & Yiyan Chen [email protected] 1

University of Chinese Academy of Social Sciences (Graduate School), Beijing 102488, China

2

School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China

3

Institute of Quantitative & Technical Economics, Chinese Academy of Social Sciences, Beijing 100732, China

algorithm has good clustering effect. Finally, we summarize the content of the article, list the advantages and disadvantages of TFNRV-k-means clustering algorithm, and propose corresponding improvement methods, which provide ideas for further research on TFNRV-k-means in the future. Keywords Fuzzy-probability binary measure space  Triangular fuzzy normal random variables  Euclidean random synthesis absolute distance  TFNRV-k-means clustering algorithm

1 Introduction Clustering analysis is an important data mining method, which has been widely used in many fields such as pattern recognition, image analysis, market research, customer relationship management, web document classification, etc. [1]. At present, there is no recognized definition of clustering in academia. Everitt [2] gives a definition of clustering: entities within a cluster are similar, entities within different clusters are not similar; one cluster is the aggregation of points in the data space, and the similarity between any two points in the same cluster is less than that in different clusters. Cluster can be described as a connected domain of a