Miscellanea

In this chapter we briefly note various studies on and around fuzzy c-means clustering that are not discussed elsewhere in this book.

  • PDF / 939,606 Bytes
  • 19 Pages / 430 x 660 pts Page_size
  • 25 Downloads / 188 Views

DOWNLOAD

REPORT


In this chapter we briefly note various studies on and around fuzzy c-means clustering that are not discussed elsewhere in this book.

5.1 More on Similarity and Dissimilarity Measures In section 4.2, we have discussed the use of a similarity measure in fuzzy c-means. In the next section we mention some other methods of fuzzy clustering in which the Euclidean distance or other specific definitions of a dissimilarity measure is unnecessary. Rather, a measure D(xk , x ) can be arbitrary so long as it has the interpretation of the dissimilarity between two objects. Moreover, a function D(x, y) of variables (x, y) is unnecessary and only its value Dk = D(xk , x ) on an arbitrary pair of objects in X is needed. In other words, the algorithm works on the N × N matrix [Dk ] instead of a binary relation D(x, y) on Rp . Let us turn to the definition of dissimilarity measures. Suppose we are given a metric D0 (x, y) that satisfies the three axioms including the triangular inequality. We do not care what type the metric D0 (x, y) is, but we define D1 (x, y) =

D0 (x, y) . 1 + D0 (x, y)

We easily find that D1 (x, y) is also a metric. To show this, notice that (i) D1 (x, y) ≥ 0 and D1 (x, y) = 0 ⇐⇒ x = y from the corresponding property of D0 (x, y). (ii) It is obvious to see D1 (x, y) = D1 (y, x) from the symmetry of D0 (x, y). (iii) We observe the triangular inequality D1 (x, y)+D1 (y, z)−D1 (x, z) ≥ 0 holds from straightforward calculation using D0 (x, y) + D0 (y, z) − D0 (x, z) ≥ 0. Note also that 0 ≤ D(x, y) ≤ 1,

∀x, y

This shows that given a metric, we can construct another metric that is bounded into the unit interval. S. Miyamoto et al.: Algorithms for Fuzzy Clustering, STUDFUZZ 229, pp. 99–117, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com 

100

Miscellanea

Another nontrivial metric which is a generalization of the Jaccard coefficient [1] is given as follows [99]: p j j j=1 |x − y | , ∀x ≥ 0, y ≥ 0. DJ (x, y) = p j j j=1 max{x , y } The proof of the triangular inequality is given in [99] and is omitted here. Furthermore, we note that the triangular inequality is unnecessary for a dissimilarity measure in clustering. We thus observe that there are many more possible measures of dissimilarity for clustering. Apart from the squared Euclidean distance and the L1 distance, the calculation of cluster centers that minimize an objective function of the fuzzy c-means types using such a general dissimilarity is difficult. In the latter case we have two approaches: 1. Use a K-medoid clustering [80]. 2. Use a relational clustering including agglomerative hierarchical clustering [1, 35, 99]. We do not discuss the K-medoid clustering here, as we cannot yet say definitely how and why this method is useful; the second approach of relational clustering is mentioned below.

5.2 Other Methods of Fuzzy Clustering Although most researches on fuzzy clustering are concentrated on fuzzy c-means and its variations, there are still other methods using the concept of fuzziness in clustering. The fuzzy equivalence relation obtai