Fuzzy-Based Kernelized Clustering Algorithms for Handling Big Data Using Apache Spark
In this paper, we propose a novel Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) and a Kernelized Scalable Literal Fuzzy c-Means (KSLFCM) clustering algorithms for big data framework. The evolution of kernelized
- PDF / 472,950 Bytes
- 13 Pages / 439.37 x 666.142 pts Page_size
- 70 Downloads / 311 Views
Abstract In this paper, we propose a novel Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) and a Kernelized Scalable Literal Fuzzy c-Means (KSLFCM) clustering algorithms for big data framework. The evolution of kernelized clustering algorithms led us to deal with the nonlinear separable problems by applying kernel Radial Basis Functions (RBF) which map the input data space nonlinearly into a high-dimensional feature space. The experimental result shows that the KSRSIO-FCM algorithm achieves significant improvement in terms of F-score, Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI) for Big Data. Experimentation is performed on well-known IRIS datasets to show the effectiveness of proposed KSRSIO-FCM in comparison with KSLFCM. The KSRSIO-FCM implemented on Apache Spark shows better potential for Big Data clustering.
P. Jha (B) · A. Tiwari · N. Nagendra · M. Mounika Department of Computer Science and Engineering, Indian Institute of Technology Indore, Indore, India e-mail: [email protected] A. Tiwari e-mail: [email protected] N. Nagendra e-mail: [email protected] M. Mounika e-mail: [email protected] N. Bharill Department of Computer Science and Engineering, Mahindra University, Ecole Centrale School of Engineering, Hyderabad, India e-mail: [email protected] M. Ratnaparkhe Biotechnology, ICAR-Indian Institute of Soybean Research Indore, Indore, India e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. M. Nigdeli et al. (eds.), Proceedings of 6th International Conference on Harmony Search, Soft Computing and Applications, Advances in Intelligent Systems and Computing 1275, https://doi.org/10.1007/978-981-15-8603-3_37
423
424
P. Jha et al.
1 Introduction Clustering is an unsupervised learning technique, which is used on similar objects of data that can be grouped to form a subset of data; this results in finding patterns of datasets [1]. This technique can be applied to the area of data mining, machine learning, information retrieval, and cybersecurity. Clustering algorithms are majorly used for the following applications: pattern recognition, data mining, classification, image segmentation, data analysis, and modeling [2–4]. Clustering is broadly divided into two parts, i.e., hierarchical and partitioning clustering [5]. Hierarchical clustering finds the clusters by partitioning data in either top-down or bottom-up fashion in a recursive manner, whereas partitioning clustering divides a dataset into a number of disjoint clusters. Hard and fuzzy are the two methods of partitioning clustering [6]. In hard clustering, each data sample is assigned to one cluster. While in fuzzy clustering, a data sample can belong to more than one cluster with varying degrees of membership. Presently, there are several fuzzy clustering algorithms such as Fuzzy c-Means (FCM) [7] and kernelized Fuzzy c-Means (KFCM) [8] that are used fo
Data Loading...