A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets
- PDF / 2,120,079 Bytes
- 20 Pages / 439.642 x 666.49 pts Page_size
- 57 Downloads / 210 Views
A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets Gaurav Mishra1 · Sraban Kumar Mohanty1 Received: 6 October 2019 / Revised: 27 March 2020 / Accepted: 29 March 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Clustering being an unsupervised learning technique, has been used extensively for knowledge discovery due to its less dependency on domain knowledge. Many clustering techniques were proposed in the literature to recognize the cluster of different characteristics. Most of them become inadequate either due to their dependency on user-defined parameters or when they are applied on multi-scale datasets. Hybrid clustering techniques have been proposed to take the advantage of both Partitional and Hierarchical techniques by first partitioning the dataset into several dense sub-clusters and merging them into actual clusters. However, the universality of the partition and merging criteria are not sufficient to capture many characteristics of the clusters. Minimum spanning tree (MST) has been used extensively for clustering because it preserves the intrinsic nature of the dataset even after the sparsification of the graph. In this paper, we propose a parameter-free, minimum spanning tree based efficient hybrid clustering algorithm to cluster the multi-scale datasets. In the first phase, we construct a MST of the dataset to capture the neighborhood information of data points and employ box-plot, an outlier detection technique on MST edge weights for effectively selecting the inconsistent edges to partition the data points into several dense sub-clusters. In the second phase, we propose a novel merging criterion to find the genuine clusters by iteratively merging only the pairs of adjacent sub-clusters. The merging technique involves both dis-connectivity and intra-similarity using the topology of two adjacent pairs which helps to identify the arbitrary shape and varying density clusters. Experiment results on various synthetic and real world datasets demonstrate the superior performance of the proposed technique over other popular clustering algorithms. Keywords Partitioning and merging approach · Minimum spanning tree based clustering · Box-plot method · Clustering multi-scale datasets Gaurav Mishra
[email protected] Sraban Kumar Mohanty [email protected] 1
Department of Computer Science and Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, India
Journal of Intelligent Information Systems
1 Introduction Clustering is an unsupervised learning technique that is used extensively to discover the underlying patterns of data objects. Several clustering techniques were proposed to effectively recognize the clusters of different characteristics(Hartigan and Wong 1979; Jain and Dubes 1988; Ester et al. 1996; Koga et al. 2007; Schlitter et al. 2014; Limwattanapibool and Arch-int 2017; Kavitha and Tamilarasan 2019; Otoo et al. 2001). The popular approaches like partitional, hie
Data Loading...