Fuzzy-Based Kernelized Clustering Algorithms for Handling Big Data Using Apache Spark

In this paper, we propose a novel Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) and a Kernelized Scalable Literal Fuzzy c-Means (KSLFCM) clustering algorithms for big data framework. The evolution of kernelized

PDF / 472,950 Bytes
13 Pages / 439.37 x 666.142 pts Page_size
70 Downloads / 345 Views

DOWNLOAD

REPORT

Abstract In this paper, we propose a novel Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) and a Kernelized Scalable Literal Fuzzy c-Means (KSLFCM) clustering algorithms for big data framework. The evolution of kernelized clustering algorithms led us to deal with the nonlinear separable problems by applying kernel Radial Basis Functions (RBF) which map the input data space nonlinearly into a high-dimensional feature space. The experimental result shows that the KSRSIO-FCM algorithm achieves significant improvement in terms of F-score, Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI) for Big Data. Experimentation is performed on well-known IRIS datasets to show the effectiveness of proposed KSRSIO-FCM in comparison with KSLFCM. The KSRSIO-FCM implemented on Apache Spark shows better potential for Big Data clustering.

P. Jha (B) · A. Tiwari · N. Nagendra · M. Mounika Department of Computer Science and Engineering, Indian Institute of Technology Indore, Indore, India e-mail: phd1801201006@iiti.ac.in A. Tiwari e-mail: artiwari@iiti.ac.in N. Nagendra e-mail: nehanagendra02@gmail.com M. Mounika e-mail: mounikamukkamalla16@gmail.com N. Bharill Department of Computer Science and Engineering, Mahindra University, Ecole Centrale School of Engineering, Hyderabad, India e-mail: neha.bharill@mechyd.ac.in M. Ratnaparkhe Biotechnology, ICAR-Indian Institute of Soybean Research Indore, Indore, India e-mail: ratnaparkhe.milind@gmail.com © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. M. Nigdeli et al. (eds.), Proceedings of 6th International Conference on Harmony Search, Soft Computing and Applications, Advances in Intelligent Systems and Computing 1275, https://doi.org/10.1007/978-981-15-8603-3_37

423

424

P. Jha et al.

1 Introduction Clustering is an unsupervised learning technique, which is used on similar objects of data that can be grouped to form a subset of data; this results in finding patterns of datasets [1]. This technique can be applied to the area of data mining, machine learning, information retrieval, and cybersecurity. Clustering algorithms are majorly used for the following applications: pattern recognition, data mining, classification, image segmentation, data analysis, and modeling [2–4]. Clustering is broadly divided into two parts, i.e., hierarchical and partitioning clustering [5]. Hierarchical clustering finds the clusters by partitioning data in either top-down or bottom-up fashion in a recursive manner, whereas partitioning clustering divides a dataset into a number of disjoint clusters. Hard and fuzzy are the two methods of partitioning clustering [6]. In hard clustering, each data sample is assigned to one cluster. While in fuzzy clustering, a data sample can belong to more than one cluster with varying degrees of membership. Presently, there are several fuzzy clustering algorithms such as Fuzzy c-Means (FCM) [7] and kernelized Fuzzy c-Means (KFCM) [8] that are used fo

Data Loading...

Fuzzy-Based Kernelized Clustering Algorithms for Handling Big Data Using Apache Spark

Recommend Documents

Apache Spark, Big Data, and Azure Databricks

Clustering of Time-Series Balance History Data Streams Using Apache Spark

A survey on parallel clustering algorithms for Big Data

Classification of Big Data Using Spark Framework

Big Data and Clustering

Parallel knowledge acquisition algorithms for big data using MapReduce

The Big Data Approach Using Bio-Inspired Algorithms: Data Imputation

Handling Data Skew for Aggregation in Spark SQL Using Task Stealing

Beginning Apache Spark Using Azure Databricks Unleashing Large Clust

Big Data Clustering Using MapReduce Framework: A Review

Beginning Apache Spark 2 With Resilient Distributed Datasets, Spark

Apache Spark Implementation of Whale Optimization Algorithm