Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, acc
- PDF / 496,503 Bytes
- 8 Pages / 595.276 x 790.866 pts Page_size
- 83 Downloads / 151 Views
ORIGINAL RESEARCH
Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time Syed Imtiyaz Hassan1 • Afreen Samad1 • Omair Ahmad1 • Afshar Alam1
Received: 20 February 2019 / Accepted: 20 November 2019 Bharati Vidyapeeth’s Institute of Computer Applications and Management 2019
Abstract Clustering is an unsupervised data mining technique where exploration is done with little knowledge of data classes. Its aim is to recognize the hidden information from the data for effective decision-making. Though many clustering algorithms has already been implemented till date, still it is an active topic of research for data mining. Researcher’s attempts to explore, compare, evaluate, and improve the different clustering algorithms available, for specialized situation and context. The purpose of all these efforts are to refine and propose improved version of algorithm after statistical evaluation by different metrices. The present research is an attempt to analysis empirically, the partitioning based clustering algorithms and hierarchical based clustering algorithm; by conducting extensive experiments. Both algorithms effectiveness has been measured through external and internal validity indices and Pearson’s correlation distance function using anatomized experiments. The parameters of evaluation that have been taken into consideration; for Internal Indices: Silhouette Index, Davies-Bouldin Validity Index and Calinski-Harabasz index; for external indices: Jaccard index, Rand Index, Entropy and Normalized Mutual Information. The
& Syed Imtiyaz Hassan [email protected] Afreen Samad [email protected] Omair Ahmad [email protected] Afshar Alam [email protected] 1
Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard (Deemed to be University), New Delhi, India
other parameters of evaluation are accuracy and time of execution. Based on the experiments it may be concluded that K-means algorithm produces more promising result than hierarchical algorithm except in accuracy. Keywords Data mining Data science Machine learning Clustering algorithm K-means Hierarchical algorithm Validation indices Pearson’s correlation distance
1 Introduction One of the most powerful meta-learning technique emerged is clustering. It is a tool which aids the data examination process. Some of the data sets don’t have natural groupings in them, so they have to undergo the process of clustering for determination of the groups [1]. For example, in sentiment analysis of students, one of the approaches is to cluster their opinions into the category where it fits [2]. Clustering techniques inspect the similarities between the data and classify data into similar clusters [3, 4]. The properties of good clustering technique shall be: (1) homogeneity; similarity among the data of same cluster should be greater, and (2) heterogeneity; dissimilarity among the data of different clusters should be high [5]
Data Loading...