Multiple Consensuses Clustering by Iterative Merging/Splitting of Clustering Patterns

The existence of many clustering algorithms with variable performance on each dataset made the clustering task difficult. Consensus clustering tries to solve this problem by combining the partitions generated by different algorithms to build a new solutio

PDF / 1,162,772 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
98 Downloads / 225 Views

DOWNLOAD

REPORT

Abstract. The existence of many clustering algorithms with variable performance on each dataset made the clustering task diﬃcult. Consensus clustering tries to solve this problem by combining the partitions generated by diﬀerent algorithms to build a new solution that is more stable and achieves better results. In this work, we propose a new consensus method that, unlike others, give more insight on the relations between the diﬀerent partitions in the clusterings ensemble, by using the frequent closed itemsets technique, usually used for association rules discovery. Instead of generating one consensus, our method generates multiple consensuses based on varying the number of base clusterings, and links these solutions in a hierarchical representation that eases the selection of the best clustering. This hierarchical view also provides an analysis tool, for example to discover strong clusters or outlier instances. Keywords: Unsupervised learning · Clustering · Consensus clustering · Ensemble clustering · Frequent closed itemsets

1

Introduction

Although the last decades witnessed the development of many clustering algorithms, getting a “good” quality partitioning remains a diﬃcult task. This problem has many dimensions, one of them is the fact that the results of clustering algorithms are data-dependent. An algorithm can achieve good results in some datasets while in others it does not. This is because each is designed to discover a speciﬁc clustering structure in the dataset. Another aspect of the problem is the eﬀect of algorithm parameters on the results since changing the settings may produce diﬀerent partitioning in terms of the number and size of clusters. Deﬁning what should be a “correct” (or a “good”) clustering also contributes to the problem, despite the existence of many validation measures whether internal or external.1 External validation measures are not always applicable, because usually class labels are not provided, especially for large datasets. Moreover, F¨ arber et al. [6] states that using such measures, usually applied to synthetic datasets, may not be sound for real datasets because the classes may contain 1

More details about validation measures can be found in [4], [12] and [19].

c Springer International Publishing Switzerland 2016 P. Perner (Ed.): MLDM 2016, LNAI 9729, pp. 790–804, 2016. DOI: 10.1007/978-3-319-41920-6 60

Multiple Consensuses Clustering by Iterative Merging/Splitting

791

internal structures that the present attributes may not allow to retrieve, or also the classes may contain anomalies. On the other hand, internal validation measures may overrate the results of a clustering algorithm which targets the same underlying structure model as the one targeted by the measure. From many available clustering algorithms with variable outcome, researchers focused recently on the possibility of combining multiple clusterings, called base clusterings, to build a new consensus solution that can be better than what each single base clustering could achieve. Such process is called co

Data Loading...

Multiple Consensuses Clustering by Iterative Merging/Splitting of Clustering Patterns

Recommend Documents

Clustering Hashtags Using Temporal Patterns

Efficient Clustering of Databases Induced by Local Patterns

Multi-view Clustering via Multiple Auto-Encoder

Clustering

Clustering

Clustering

Clustering

Clustering

Image Clustering by Generative Adversarial Optimization and Advanced Clustering Criteria

Clustering

Assisted requirements selection by clustering

Clustering