Intra-cluster Similarity Index Based on Fuzzy Rough Sets for Fuzzy C-Means Algorithm

Cluster validity indices have been used to evaluate the quality of fuzzy partitions. In this paper, we propose a new index, which uses concepts of Fuzzy Rough sets to evaluate the average intra-cluster similarity of fuzzy clusters produced by the fuzzy c-

  • PDF / 464,198 Bytes
  • 8 Pages / 430 x 660 pts Page_size
  • 76 Downloads / 222 Views

DOWNLOAD

REPORT


Abstract. Cluster validity indices have been used to evaluate the quality of fuzzy partitions. In this paper, we propose a new index, which uses concepts of Fuzzy Rough sets to evaluate the average intra-cluster similarity of fuzzy clusters produced by the fuzzy c-means algorithm. Experimental results show that contrasted with several well-known cluster validity indices, the proposed index can yield more desirable cluster number estimation. Keywords: Fuzzy c-means algorithm, Fuzzy Rough sets, Intra-cluster similarity, Cluster validity index.

1

Introduction

Cluster analysis for revealing the structure existing in a given data (patterns) set can be viewed as the problem of dividing the data set into a few compact subsets. The fuzzy c-means (FCM) algorithm [1] for cluster analysis has been the dominant approach in both theoretical and practical applications of fuzzy techniques for the last two decades. The aim of FCM is to partition a given set of data points (patterns) X = {x1 , x2 , · · · , xn } ⊂ Rp into c clusters represented as fuzzy sets F1 , F2 , · · · , Fc . The FCM objective function has the form of Jm (U, V ) =

c  n 

2 um ij  xj − vi  ,

(1)

i=1 j=1

where vi is the centroid of the fuzzy cluster Fi ,  ·  is a certain distance function, the exponent m > 1 is a fuzzifier, uij = Fi (xj ) is the membership of xj c value n belonging to Fi satisfying i=1 uij = 1 (j = 1, 2, · · · , n) and 0 < j=1 uij < n (i = 1, 2, · · · , c), U = [uij ] is the partition matrix, and V = {v1 , v2 , · · · , vc } is the set of all cluster centroids. FCM iteratively updates U and V to minimize Jm (U, V ) until a certain termination criterion has been satisfied. In FCM, a fuzzy partition is denoted as (U, V ). In FCM, if c is not known a priori, a cluster validity index must be used to evaluate the quality of fuzzy partitions for different values of c to find out the optimal cluster number. In most cited indices, e.g. the Xie-Beni index [2] and the G. Wang et al. (Eds.): RSKT 2008, LNAI 5009, pp. 316–323, 2008. c Springer-Verlag Berlin Heidelberg 2008 

Intra-cluster Similarity Index Based on Fuzzy Rough Sets

317

Fukuyama-Sugeno index [3], the intra-cluster similarity of a fuzzy partition is estimated by using distances between data points and cluster centroids. But this approach is not effective for large values of c, because limc→n  xj − vi 2 = 0 (see [4,5]). To overcome this shortcoming, the Kwon index [5] is proposed, and another kind of index has been proposed in recent years [6,7]. This kind of index only considers the inter-cluster proximity, which is evaluated by the membership values of each data point belonging to all fuzzy clusters whereas the distance function is not taken into account. In this paper, we propose a new method to assess the intra-cluster similarity of a fuzzy cluster by using the concepts of Fuzzy Rough sets. And the intracluster similarity index of a fuzzy partition obtained from FCM is defined as the average intra-cluster similarity of all fuzzy clusters. Experimental results indicate that the