Multiple Consensuses Clustering by Iterative Merging/Splitting of Clustering Patterns
The existence of many clustering algorithms with variable performance on each dataset made the clustering task difficult. Consensus clustering tries to solve this problem by combining the partitions generated by different algorithms to build a new solutio
- PDF / 1,162,772 Bytes
- 15 Pages / 439.37 x 666.142 pts Page_size
- 98 Downloads / 197 Views
Abstract. The existence of many clustering algorithms with variable performance on each dataset made the clustering task difficult. Consensus clustering tries to solve this problem by combining the partitions generated by different algorithms to build a new solution that is more stable and achieves better results. In this work, we propose a new consensus method that, unlike others, give more insight on the relations between the different partitions in the clusterings ensemble, by using the frequent closed itemsets technique, usually used for association rules discovery. Instead of generating one consensus, our method generates multiple consensuses based on varying the number of base clusterings, and links these solutions in a hierarchical representation that eases the selection of the best clustering. This hierarchical view also provides an analysis tool, for example to discover strong clusters or outlier instances. Keywords: Unsupervised learning · Clustering · Consensus clustering · Ensemble clustering · Frequent closed itemsets
1
Introduction
Although the last decades witnessed the development of many clustering algorithms, getting a “good” quality partitioning remains a difficult task. This problem has many dimensions, one of them is the fact that the results of clustering algorithms are data-dependent. An algorithm can achieve good results in some datasets while in others it does not. This is because each is designed to discover a specific clustering structure in the dataset. Another aspect of the problem is the effect of algorithm parameters on the results since changing the settings may produce different partitioning in terms of the number and size of clusters. Defining what should be a “correct” (or a “good”) clustering also contributes to the problem, despite the existence of many validation measures whether internal or external.1 External validation measures are not always applicable, because usually class labels are not provided, especially for large datasets. Moreover, F¨ arber et al. [6] states that using such measures, usually applied to synthetic datasets, may not be sound for real datasets because the classes may contain 1
More details about validation measures can be found in [4], [12] and [19].
c Springer International Publishing Switzerland 2016 P. Perner (Ed.): MLDM 2016, LNAI 9729, pp. 790–804, 2016. DOI: 10.1007/978-3-319-41920-6 60
Multiple Consensuses Clustering by Iterative Merging/Splitting
791
internal structures that the present attributes may not allow to retrieve, or also the classes may contain anomalies. On the other hand, internal validation measures may overrate the results of a clustering algorithm which targets the same underlying structure model as the one targeted by the measure. From many available clustering algorithms with variable outcome, researchers focused recently on the possibility of combining multiple clusterings, called base clusterings, to build a new consensus solution that can be better than what each single base clustering could achieve. Such process is called co
Data Loading...