BasicMethods for c-Means Clustering

Perhaps the best-known method for nonhierarchical cluster analysis is the method of K-means [95] which also is called the crisp c-means in this book.

  • PDF / 577,856 Bytes
  • 34 Pages / 430 x 660 pts Page_size
  • 29 Downloads / 181 Views

DOWNLOAD

REPORT


Perhaps the best-known method for nonhierarchical cluster analysis is the method of K-means [95] which also is called the crisp c-means in this book. The reason why the c-means clustering has so frequently been cited and employed is its usefulness as well as the potentiality of this method, and the latter is emphasized in this chapter. That is, the idea of the c-means clustering has the potentiality of producing various other methods for the same or similar purpose of classifying data set without an external criterion which is called unsupervised classification or more simply data clustering. Thus clustering is a technique to generate groups of subsets of data in which a group called cluster is dense in the sense that a distance within a group is small, whereas a distance between clusters is sparse in that two objects from different clusters are distant. This vague statement is made clear in the formulation of a method. On the other hand, we have the fundamental idea to use fuzzy sets to clustering. Why the idea of fuzzy clustering is employed is the same as above: not only its usefulness but also its potentiality to produce various other algorithms, and we emphasizes the latter. The fuzzy approach to clustering is capable of producing many methods and algorithms although fuzzy system for the present purpose does not have profound mathematical structure in particular. The reason that the fuzzy approach has such capability is its inherent feature of linking/connecting different methodologies including statistical models, machine learning, and various other heuristics. Therefore we must describe not only methods of fuzzy clustering but also connections to other methods. In this chapter we first describe several basic methods of c-means clustering which later will be generalized, modified, or enlarged to a larger family of related methods.

2.1 A Note on Terminology Before describing methods of clustering, we briefly review terminology used here. S. Miyamoto et al.: Algorithms for Fuzzy Clustering, STUDFUZZ 229, pp. 9–42, 2008. c Springer-Verlag Berlin Heidelberg 2008 springerlink.com 

10

Basic Methods for c-Means Clustering

First, a set of objects or individuals to be clustered is given. An object set is denoted by X = {x1 , . . . , xN } in which xk , (k = 1, 2, . . . , N ) is an object. With a few exceptions, x1 , . . . , xN are vectors of real p-dimensional space Rp . A generic element x ∈ Rp is the vector with real components x1 , . . . , xp ; we write x = (x1 , . . . , xp ) ∈ Rp . Two basic concepts used for clustering are dissimilarity and cluster center. As noted before, clustering of data is done by evaluating nearness of data. This means that objects are placed in a topological space, and the nearness is measured by using a dissimilarity between two objects to be clustered. A dissimilarity between an arbitrary pair x, x ∈ X is denoted by D(x, x ) which takes a real value. This quantity is symmetric with respect to the two arguments: D(x, x ) = D(x , x),

∀x, x ∈ X.

(2.1)

Since a dissimilarity measure qu