Fuzzy Clustering of High Dimensional Data with Noise and Outliers

Clustering high dimensional data is a challenging problem for fuzzy clustering algorithms because of so-called concentration of distance phenomenon. The most fuzzy clustering algorithms fail to work on high dimensional data producing cluster prototypes cl

PDF / 402,452 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
64 Downloads / 223 Views

DOWNLOAD

REPORT

Abstract Clustering high dimensional data is a challenging problem for fuzzy clustering algorithms because of so-called concentration of distance phenomenon. The most fuzzy clustering algorithms fail to work on high dimensional data producing cluster prototypes close to the center of gravity of the data set. The presence of noise and outliers in data is an additional problem for clustering algorithms because they might affect the computation of cluster centers. In this paper, we analyze and compare different promising fuzzy clustering algorithms in order to examine their ability to correctly determine cluster centers on high dimensional data with noise and outliers. We analyze the performance of clustering algorithms for different initializations of cluster centers: the original means of clusters and random data points in the data space. Keywords Fuzzy clustering · C-means models · High dimensional data · Noise · Possibilistic clustering

1 Introduction Clustering algorithms are used in many fields like bioinformatics, image processing, text mining, and many others. Data sets in these applications usually contain a large number of features. Therefore, there is a need for clustering algorithms that can handle high dimensional data. The hard k-means algorithm [1] is still mostly used for clustering high dimensional data, although it is comparatively unstable and sensitive to the initialization. It is not able to distinguish data items belonging to clusters from noise and outliers. This is another issue of the hard k-means algorithm because noise L. Himmelspach (B) · S. Conrad Institute of Computer Science, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany e-mail: [email protected] S. Conrad e-mail: [email protected] © Springer Nature Switzerland AG 2019 J. J. Merelo et al. (eds.), Computational Intelligence, Studies in Computational Intelligence 792, https://doi.org/10.1007/978-3-319-99283-9_11

221

222

L. Himmelspach and S. Conrad

and outliers might influence the computation of cluster centers leading to inaccurate clustering results. In the case of low dimensional data, the fuzzy c-means algorithm (FCM) [2, 3] which assigns data items to clusters with membership degrees might be a better choice because it is more stable and less sensitive to initialization [4]. The possibilistic fuzzy c-means algorithm (PFCM) [5] partitions data items in presence of noise and outliers. However, when FCM is applied on high dimensional data, it tends to produce cluster centers close to the center of gravity of the entire data set [6, 7]. In this work, we analyze different fuzzy clustering methods that are suitable for clustering high dimensional data. The first approach is the attribute weighting fuzzy clustering algorithm [8] that uses a new attribute weighting function to determine attributes that are important for each single cluster. This method was recommended in [7] for fuzzy clustering of high dimensional data. The second approach is the multivariate fuzzy c-means (MFCM) [9] that c

Data Loading...

Fuzzy Clustering of High Dimensional Data with Noise and Outliers

Recommend Documents

Maximum Likelihood Clustering with Outliers

Clustering in High-dimensional Data Spaces

Clustering High--Dimensional Data First International Workshop, CHDD

Enhanced synchronization-inspired clustering for high-dimensional data

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Fuzzy Clustering

Penalty term based suitable fuzzy intuitionistic possibilistic clustering: analyzing high dimensional gene expression ca

A Computer Vision-Based Approach for Subspace Clustering and Lagrange Multiplier Optimization in High-Dimensional Data

Is-ClusterMPP: clustering algorithm through point processes and influence space towards high-dimensional data

Subspace Approximation with Outliers

Intuitionistic Fuzzy Aggregation and Clustering

Interpolation of sparse high-dimensional data