Multiple clustering and selecting algorithms with combining strategy for selective clustering ensemble

  • PDF / 709,467 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 4 Downloads / 191 Views

DOWNLOAD

REPORT


FOUNDATIONS

Multiple clustering and selecting algorithms with combining strategy for selective clustering ensemble Tinghuai Ma1 · Te Yu1 · Xiuge Wu1 · Jie Cao2 · Alia Al-Abdulkarim3 · Abdullah Al-Dhelaan3 · Mohammed Al-Dhelaan3

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Clustering ensemble can overcome the instability of clustering and improve clustering performance. With the rapid development of clustering ensemble, we find that not all clustering solutions are effective in their final result. In this paper, we focus on selection strategy in selective clustering ensemble. We propose a multiple clustering and selecting approach (MCAS), which is based on different original clustering solutions. Furthermore, we present two combining strategies, direct combining and clustering combining, to combine the solutions selected by MCAS. These combining strategies combine results of MCAS and get a more refined subset of solutions, compared with traditional selective clustering ensemble algorithms and single clustering and selecting algorithms. Experimental results on UCI machine learning datasets show that the algorithm that uses multiple clustering and selecting algorithms with combining strategy performs well on most datasets and outperforms most selective clustering ensemble algorithms. Keywords Selective clustering ensemble · Clustering solution · Multiple clustering and selecting algorithms · Combining strategy

1 Introduction Clustering is one of the most important tools in data mining. The major goal of clustering is to seek a grouping Communicated by A. Di Nola.

B

Tinghuai Ma [email protected] Te Yu [email protected] Alia Al-Abdulkarim [email protected] Abdullah Al-Dhelaan [email protected] Mohammed Al-Dhelaan [email protected]

1

School of Computer, Nanjing University of information science and Technology, Jiangsu 210-044, Nanjing, China

2

School of Economics and Management, Nanjing University of Information Science and Technology, Nanjing 210044, China

3

Computer Science Department, College of Computer and Information Science, King Saud University, Riyadh 11362, Saudi Arabia

which makes the intra-group similarity large, but inter-group similarity small. However, using different methods or same method with different parameters on the same dataset will have different results. The basic challenge in clustering is choosing a suitable algorithm for one dataset. Strehl and Ghosh (2003) proposed clustering ensemble which combines independent clustering results rather than finds the best ones. Clustering ensemble, known as clustering aggregation and consensus clustering, is characterized by high robustness, stability, novelty, scalability and parallelism (Yu et al. 2014; Lv et al. 2016; Jia et al. 2011; Ma et al. 2018). In addition, clustering ensemble has advantages in privacy protection and knowledge reuse. It only needs to access clustering solutions rather than original data, so it provides privacy protection for original data (Akbari et al. 2015). Clustering ensemble uses