Robust Fuzzy Clustering via Trimming and Constraints
A methodology for robust fuzzy clustering is proposed. This methodology can be widely applied in very different statistical problems given that it is based on probability likelihoods. Robustness is achieved by trimming a fixed proportion of “most outlying
- PDF / 393,926 Bytes
- 8 Pages / 439.37 x 666.142 pts Page_size
- 16 Downloads / 244 Views
Abstract A methodology for robust fuzzy clustering is proposed. This methodology can be widely applied in very different statistical problems given that it is based on probability likelihoods. Robustness is achieved by trimming a fixed proportion of “most outlying” observations which are indeed self-determined by the data set at hand. Constraints on the clusters’ scatters are also needed to get mathematically well-defined problems and to avoid the detection of non-interesting spurious clusters. The main lines for computationally feasible algorithms are provided and some simple guidelines about how to choose tuning parameters are briefly outlined. The proposed methodology is illustrated through two applications. The first one is aimed at heterogeneously clustering under multivariate normal assumptions and the second one might be useful in fuzzy clusterwise linear regression problems.
1 Introduction Hard clustering methods are aimed at searching meaningful partitions of a data set into k disjoint clusters. Therefore, “0–1” membership values of observations to clusters are provided. On the other hand, fuzzy clustering methods provide nonnegative membership values which may generate overlapping clusters where every subject is shared among all clusters [2, 28]. It is known that the presence of an (even a small) amount of outlying observations can be problematic when applying traditional hard clustering methods. For instance, F. Dotto · A. Farcomeni Sapienza University of Rome, Rome, Italy e-mail: [email protected] A. Farcomeni e-mail: [email protected] L.A. García-Escudero (B) · A. Mayo-Iscar University of Valladolid, Valladolid, Spain e-mail: [email protected] A. Mayo-Iscar e-mail: [email protected] © Springer International Publishing Switzerland 2017 M.B. Ferraro et al. (eds.), Soft Methods for Data Science, Advances in Intelligent Systems and Computing 456, DOI 10.1007/978-3-319-42972-4_25
197
198
F. Dotto et al.
clearly differentiated clusters can be wrongly joined together and non-interesting clusters (made up of only few outlying observations) can be detected. This is also the case when applying many fuzzy clustering techniques. In fact, historically, the fuzzy clustering community was the first one to face this robustness issue. This is due to the fact that outliers may be approximately “equally remote” from all clusters and, thus, they may have similar (but not necessarily small) membership values. References on robustness in hard clustering can be found in [10] and in two recent [7, 24] books. On the other hand, [1, 4] are good reviews on robust fuzzy clustering. These proposals in fuzzy clustering include “noise clustering” [3], the replacement of the Euclidean distance by other discrepancy measures [22, 31] or the use of “possibilistic” clustering [19]. Trimming has a long history as a simple way to provide robustness to statistical procedures. Its application in clustering needs to be done by taking into account the possibility of discarding “bridge points”. A sensible way to per
Data Loading...