Improved Clustering for Categorical Data with Genetic Algorithm

Clustering is the most significant unsupervised learning where the aim is to partition the data set into uniform groups called clusters. Many real-world data sets often contain categorical values, but many clustering algorithms work only on numeric values

PDF / 662,452 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
10 Downloads / 234 Views

DOWNLOAD

REPORT

Abstract Clustering is the most signiﬁcant unsupervised learning where the aim is to partition the data set into uniform groups called clusters. Many real-world data sets often contain categorical values, but many clustering algorithms work only on numeric values which limits its use in data mining. The k-modes algorithm is one of the very effective for proper partitions of categorical data sets, though the algorithm stops at locally optimum solution as depended on initial cluster centres. Proposed algorithm utilizes the genetic algorithm (GA) to optimize the k-modes clustering algorithm. The reason is, considering noise as cluster centres gives the high cost which will not ﬁt for the next iteration and also not gets stuck to the suboptimal solutions. The superiority of proposed algorithm is demonstrated for several real-life data sets in terms of accuracy and proves it is efﬁcient and can reveal encouraging results especially for the large datasets. Keywords Clustering

Categorical data Genetic algorithm k-modes algorithm

1 Introduction The ever-growing data in almost all ﬁelds signiﬁcantly contribute towards future decision-making, extracting hidden, but potentially useful information embedded in the data. In depth of the clustering problem, many clustering methods usually require the designer to provide the name and number of clusters as input. Unfortunately, the designer has no idea about the inherent structure of huge data sets. As well as clustering result is sensitive to the selection of the initial cluster centres. This sensitivity may make the algorithm converge to the local optima. So, the most challenging and difﬁcult task is the determination of the number and name A. Sharma (&) R. S. Thakur Maulana Azad National Institute of Technology, Bhopal, India e-mail: [email protected] R. S. Thakur e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 V. Nath (ed.), Proceedings of the International Conference on Microelectronics, Computing & Communication Systems, Lecture Notes in Electrical Engineering 453, https://doi.org/10.1007/978-981-10-5565-2_6

67

68

A. Sharma and R. S. Thakur

of clusters in a data set, which is a basic input parameter for most clustering algorithms. Clustering [1–3] is an important unsupervised classiﬁcation technique which groups the data objects in database such a way that objects of similar pattern in some sense reside in one cluster and objects in different clusters are dissimilar in same sense [4, 5]. Clustering has been effectively applied on variety of engineering and scientiﬁc applications such as bio-informatics, astronomy, medical imaging, remote sensing, physics, etc. Data matrix and dissimilarity matrix are basically two types of data structure for clustering, if the data is not in this format then need to preprocess the data in above suitable format [6]. Clustering algorithm generally classiﬁed into two categories hierarchical and partitioning. Hierarchical clustering algorithm builds a hierarchy of partition at each level. This paper

Data Loading...

Improved Clustering for Categorical Data with Genetic Algorithm

Recommend Documents

Improved analysis of spectral algorithm for clustering

A genetic algorithm for spatiosocial tensor clustering

Rough subspace-based clustering ensemble for categorical data

An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering

An Improved Adaptive Genetic Algorithm

Categorical data

An Improved Genetic-Based Link Clustering for Overlapping Community Detection

An Analysis of K-Means, Particle Swarm Optimization and Genetic Algorithm with Data Clustering Technique

A Quantum-Inspired Genetic K-Means Algorithm for Gene Clustering

Applying K-means Clustering and Genetic Algorithm for Solving MTSP

Improved Water Cycle Algorithm and K-Means Based Method for Data Clustering

Visualizing Categorical Data