Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, acc

PDF / 496,503 Bytes
8 Pages / 595.276 x 790.866 pts Page_size
83 Downloads / 172 Views

ORIGINAL RESEARCH

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time Syed Imtiyaz Hassan1 • Afreen Samad1 • Omair Ahmad1 • Afshar Alam1

Received: 20 February 2019 / Accepted: 20 November 2019 Bharati Vidyapeeth’s Institute of Computer Applications and Management 2019

Abstract Clustering is an unsupervised data mining technique where exploration is done with little knowledge of data classes. Its aim is to recognize the hidden information from the data for effective decision-making. Though many clustering algorithms has already been implemented till date, still it is an active topic of research for data mining. Researcher’s attempts to explore, compare, evaluate, and improve the different clustering algorithms available, for specialized situation and context. The purpose of all these efforts are to refine and propose improved version of algorithm after statistical evaluation by different metrices. The present research is an attempt to analysis empirically, the partitioning based clustering algorithms and hierarchical based clustering algorithm; by conducting extensive experiments. Both algorithms effectiveness has been measured through external and internal validity indices and Pearson’s correlation distance function using anatomized experiments. The parameters of evaluation that have been taken into consideration; for Internal Indices: Silhouette Index, Davies-Bouldin Validity Index and Calinski-Harabasz index; for external indices: Jaccard index, Rand Index, Entropy and Normalized Mutual Information. The

& Syed Imtiyaz Hassan [email protected] Afreen Samad [email protected] Omair Ahmad [email protected] Afshar Alam [email protected] 1

Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard (Deemed to be University), New Delhi, India

other parameters of evaluation are accuracy and time of execution. Based on the experiments it may be concluded that K-means algorithm produces more promising result than hierarchical algorithm except in accuracy. Keywords Data mining Data science Machine learning Clustering algorithm K-means Hierarchical algorithm Validation indices Pearson’s correlation distance

1 Introduction One of the most powerful meta-learning technique emerged is clustering. It is a tool which aids the data examination process. Some of the data sets don’t have natural groupings in them, so they have to undergo the process of clustering for determination of the groups [1]. For example, in sentiment analysis of students, one of the approaches is to cluster their opinions into the category where it fits [2]. Clustering techniques inspect the similarities between the data and classify data into similar clusters [3, 4]. The properties of good clustering technique shall be: (1) homogeneity; similarity among the data of same cluster should be greater, and (2) heterogeneity; dissimilarity among the data of different clusters should be high [5]

Data Loading...

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, acc

Recommend Documents

External and Internal Money

A Spectral Clustering Algorithm Based on Hierarchical Method

A Network Traffic Classification Method Based on Hierarchical Clustering

Internal and External Validity Issues

A Novel Graph Partitioning Criterion Based Short Text Clustering Method

Hierarchical Clustering

Multi-stage Hierarchical Clustering Method Based on Hypergraph

Hierarchical Models and More on Convergence Assessment

Non-Hierarchical Clustering

Combined External Fixation and Internal Fixation

Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning

A comparative assessment of external morphological traits between Macaca munzala , Tawang and Macaca assamensis , Goalpa