Biogeography-Based Optimization for Cluster Analysis

With the aim of resolving the issue of cluster analysis more precisely and validly, a new approach was proposed based on biogeography-based optimization (abbreviated as BBO) algorithm. (Method) First, we reformulated the problem with an optimization model

  • PDF / 277,037 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 34 Downloads / 200 Views

DOWNLOAD

REPORT


Abstract With the aim of resolving the issue of cluster analysis more precisely and validly, a new approach was proposed based on biogeography-based optimization (abbreviated as BBO) algorithm. (Method) First, we reformulated the problem with an optimization model based on the variance ratio criterion (VARAC). Then, BBO was presented to search the optimal solution of the VARAC. There are 400 data of four groups in the experimental dataset, which have the degrees of overlapping of three distinct scales. The first one is nonoverlapping, the second one is partial overlapping, and the last is severely overlapping. BBO algorithm was compared with three different state-of-the-art approaches. We ran every algorithm 20 times. In this experiment, our results demonstrate the maximum VARAC values that can be found by BBO. The conclusion is that BBO is predominant which is extremely quick for the issue of clustering analysis. Keywords Biogeography-based optimization analysis



Genetic algorithm



Cluster

X. Wu ⋅ H. Wang ⋅ Z. Lu ⋅ S. Wang (✉) ⋅ Y. Zhang (✉) School of Computer Science and Technology, Nanjing Normal University, Nanjing, Jiangsu 210023, China e-mail: [email protected] Y. Zhang e-mail: [email protected] X. Wu Key Laboratory of Statistical Information Technology & Data Mining, State Statistics Bureau, Chengdu, Sichuan 610225, China X. Wu School of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan 453000, China Z. Chen School of Electronic Information, Shanghai Dianji University, Shanghai 200240, China Z. Chen Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China © Springer Nature Singapore Pte Ltd. 2017 S.K. Bhatia et al. (eds.), Advances in Computer and Computational Sciences, Advances in Intelligent Systems and Computing 553, DOI 10.1007/978-981-10-3770-2_1

3

4

X. Wu et al.

1 Introduction In the context of a group, cluster analysis is defined as a case of substance with the mode that a lot of targets in a large cluster which are closer to another side than others in the additional clusters [1]. Cluster analysis is a way of unsupervised studying, and in lots of areas, it is also used for statistical data analysis in a common technique, consisting of breeding value [2], food quality monitoring [3], gene engineering [4], pediatric immunization distress [5], chronic rhinosinusitis [6], community analysis [7], etc. Currently, various algorithms were proposed for cluster analysis. They can be basically classified into the following four categories: centroid-based clustering, distribution-based clustering, density-based clustering, and connectivity-based clustering. In the research, the most attractive to us is centroid-based methods. In this type, there are two representative algorithms, one is fuzzy c-means clustering (FCM) [8], the other is k-means clustering [9]. These are iterative methods and affected by a lot of factors, for example, if the initial partition is not determined properly