An entropy-based initialization method of K -means clustering on the optimal number of clusters

PDF / 4,751,701 Bytes
18 Pages / 595.276 x 790.866 pts Page_size
36 Downloads / 235 Views

(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

An entropy-based initialization method of K-means clustering on the optimal number of clusters Kuntal Chowdhury1 • Debasis Chaudhuri2 • Arup Kumar Pal1 Received: 12 February 2020 / Accepted: 26 October 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Clustering is an unsupervised learning approach used to group similar features using specific mathematical criteria. This mathematical criterion is known as the objective function. Any clustering is done depending on some objective function. Kmeans is one of the widely used partitional clustering algorithms whose performance depends on the initial point and the value of K. In this paper, we have combined both these parameters. We have defined an entropy-based objective function for the initialization process, which is better than other existing initialization methods of K-means clustering. Here, we have also designed an algorithm to calculate the correct number of clusters of datasets using some cluster validity indexes. In this paper, the entropy-based initialization algorithm has been proposed and applied to different 2D and 3D data sets. The comparison with other existing initialization methods has been represented in this paper. Keywords Clustering Cluster validity indexes Unsupervised K-means

1 Introduction Clustering is known as unsupervised learning, where the given data are grouped into classes according to the criteria function [18]. It is also known as the favored technique of assigning a given data into similar classes depending on specific features. The clusters correspond to hidden patterns, and the search for it is unsupervised learning by considering the machine learning perspective. Algorithms and methods for clustering analysis provide core techniques for handling the numerous applications, such as information retrieval, text mining [5], weblog analysis [39], etc. The choice of the number of clusters and the seed point’s initial position are the essential factors for the & Kuntal Chowdhury [email protected] Debasis Chaudhuri [email protected] Arup Kumar Pal [email protected] 1

Department of CSE, Indian Institute of Technology (Indian School of Mines) [IIT(ISM)], Dhanbad, Jharkhand, India

2

Deputy General Manager, DRDO Integration Centre, Panagarh, West Bengal, India

partitional clustering algorithms to produce the qualitative clusters. K-means algorithm can be applied to any large datasets with the prior value of K [17]. Literature surveys reveal the different methods for the automatic detection of the optimal value of K [10, 32, 40]. Another important application of optimality in clustering is wireless sensor networks to increase energy-efficient data transmission and provide the solution to prolong the network lifetime [19].

1.1 Similar literature on initialization algorithms This section has described the different works regarding the initial seed selection of the K-means algorithm. To achieve the global optimum results

Data Loading...

An entropy-based initialization method of K -means clustering on the optimal number of clusters

Recommend Documents

K-means tree: an optimal clustering tree for unsupervised learning

Adaptive Determining for Optimal Cluster Number of K-Means Clustering Algorithm

k-Means Clustering

cs-means : Determining optimal number of clusters based on a level-of-similarity

Clustering Analysis of Extreme Temperature Based on K-means Algorithm

Estimating the number of clusters via a corrected clustering instability

A novel prediction method of complex univariate time series based on k -means clustering

A Novel MapReduce Based k-Means Clustering

A Color Image Segmentation Method Based on Improved K-Means Clustering Algorithm

A hybrid approach to speed-up the k -means clustering method

Improving the Accuracy of the KNN Method When Using an Even Number K of Neighbors

An improved fast level set method initialized with a combination of k-means clustering and Otsu thresholding for unsuper