An application of sine cosine algorithm-based fuzzy possibilistic c -ordered means algorithm to cluster analysis

  • PDF / 391,661 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 66 Downloads / 269 Views

DOWNLOAD

REPORT


(0123456789().,-volV) (0123456789().,-volV)

METHODOLOGIES AND APPLICATION

An application of sine cosine algorithm-based fuzzy possibilistic cordered means algorithm to cluster analysis R. J. Kuo1



Jun-Yu Lin3 • Thi Phuong Quyen Nguyen2

 Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Due to advances in information technology, data collection is becoming much easier. Clustering is an important technique for exploring data structures used in many fields, such as customer segmentation, image recognition, social science, and so on. However, in real-world applications, there are a lot of noises or outliers which will seriously influence the clustering performance in the dataset. Besides, the clustering results are susceptible to the initial centroids and algorithm parameters. To overcome the influence of outliers on clustering results, this study combines the advantages of probability c-means and fuzzy c-ordered means to propose a fuzzy possibilistic c-ordered means (FPCOM) algorithm. In order to solve the problem of parameters and initial centroids determination, this study employs a sine cosine algorithm (SCA) combined with FPCOM to improve the clustering results. The proposed algorithm is named SCA-FPCOM algorithm. Ten benchmark datasets collected from the UCI machine repository were used to validate the proposed algorithm in terms of adjusted rand index and the Silhouette coefficient. According to the experimental results, the SCA-FPCOM algorithm can obtain better results than other algorithms. Keywords Clustering analysis  Sine cosine algorithm  Outliers  Fuzzy c-means algorithm  Possibilistic fuzzy c-means algorithm  Fuzzy c-ordered means algorithm

1 Introduction The clustering techniques in data mining can extract the relevance of data points from a large amount of data, and similar data points will be clustered in the same cluster.

Communicated by V. Loia. & R. J. Kuo [email protected] Jun-Yu Lin [email protected] Thi Phuong Quyen Nguyen [email protected] 1

Department of Industrial Management, National Taiwan University of Science and Technology, No. 43, Section 4, Kee-Lung Road, Taipei 106, Taiwan

2

Faculty of Project Management, The University of Danang, University of Science and Technology, 54 Nguyen Luong Bang, Danang, Vietnam

3

Powertech Technology Inc., No. 12, Aly. 13, Ln. 211, Linsen E. Rd., East Dist., Chiayi 600, Taiwan

Therefore, the purpose of clustering is to have high homogeneity within the cluster and significant differences between different clusters. The clustering is an unsupervised learning method since clustering is usually performed without available information that is related to the membership of data items to the pre-determined label. Cluster analysis has also been applied in many areas, such as pattern recognition (Zhao 2019; Wen 2019), image processing (Farhang 2017), information retrieval (Djenouri et al. 2018; Nicholls and Bright 2019), text mining (Allahyari et al. 2017), computer graphics (Garces et al. 2012), and so on. There are v