Two phase cluster validation approach towards measuring cluster quality in unstructured and structured numerical dataset

  • PDF / 1,337,764 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 49 Downloads / 180 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Two phase cluster validation approach towards measuring cluster quality in unstructured and structured numerical datasets S. Sreedhar Kumar1 · Syed Thouheed Ahmed1 · P. Vigneshwaran2 · H. Sandeep3 · H. Manjunath Singh1 Received: 10 November 2019 / Accepted: 17 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract This paper presents an improved cluster validation scheme called two phase cluster validation (TPCV) and aims to estimate the inter closeness and inter separation among the clusters in the cluster set of unsupervised clustering schemes based on probability measure for validating the cluster quality without prior identification. First phase, the TPCV computes the representative cluster centroid of each individual cluster in the cluster set based on standard mean operation and then it estimates the probability of inter closeness of each cluster with other clusters in the cluster set based on cluster centroid. Next phase, it calculates the probability of separation among the clusters in the cluster set based on cluster centroid by distance measure. Experimental results show that the TPCV scheme is simple and effective to estimate the cluster quality by measuring the probability of closeness and separation between the clusters in the result of unsupervised clustering scheme. Keywords  Distance metric · Probability of inter closeness · Probability of inter separation · Representative cluster centroid · Two phase cluster validation · Unsupervised clustering

1 Introduction Generally, the cluster validation is a type of quality measure and it can evaluate the correlation and divergence between the clusters in the resulting cluster of the clustering algorithm. In the unsupervised clustering techniques there is no predetermined patterns for that reason and it is complicated to find the suitable metric for validating, if the identified cluster relationship is acceptable or not. There are three traditional cluster validation techniques namely external criteria, internal criteria and relative criteria (Rui et al. 2009; Theodoridis and Koutroubas 1999; Kantandzic 2011). The external criteria and internal criteria are used to evaluate the resulting cluster of clustering algorithm based on statistical and hypotheses methods (Jain and Dubes 1998). The external criteria is intended to evaluate the clusters in the * S. Sreedhar Kumar [email protected] 1



Dr. T. Thimmaiah Institute of Technology, VTU, KGF, Karnataka, India

2



Jain (Deemed To Be) University, Bengaluru, Karnataka, India

3

K S School of Engineering and Management, Bengaluru, India



resulting cluster of the clustering algorithm through the process of comparing the obtained clustering result with prespecified clustering result of the dataset. In contrast to external criteria, the internal criterion is validating the resulting cluster of the dataset based on the proximity metrics without any priori information. The relative criteria is evaluating the quality of resulting cluster based on compare the