K-DBSCAN: An improved DBSCAN algorithm for big data

PDF / 1,436,356 Bytes
22 Pages / 439.37 x 666.142 pts Page_size
48 Downloads / 400 Views

K‑DBSCAN: An improved DBSCAN algorithm for big data Nahid Gholizadeh1 · Hamid Saadatfar1 · Nooshin Hanafi1 Accepted: 16 November 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Big data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to accelerate the DBSCAN execution speed so that the algorithm can respond to big datasets in an acceptable period of time. To overcome the problem, an initial grouping was applied to the data in this article through the K-means++ algorithm. DBSCAN was then employed to perform clustering in each group separately. As a result, the computational burden of DBSCAN execution reduced and the clustering execution speed increased significantly. Finally, border clusters were merged if necessary. According to the results of executing the proposed algorithm, it managed to greatly reduce the DBSCAN execution time (98% in the best-case scenario) with no significant changes in the qualitative evaluation criteria for clustering. Keywords Data mining · Clustering · Big data · DBSCAN algorithm · K-means++ algorithm

1 Introduction The age of big data has resulted in the development and application of technologies and methods aimed at utilizing large amounts of data to support decisionmaking and knowledge discovery activities [1]. Large amounts of data have made Electronic supplementary material The online version of this article (https://doi.org/10.1007/s1122 7-020-03524-3) contains supplementary material, which is available to authorised users. * Hamid Saadatfar [email protected] Nahid Gholizadeh [email protected] Nooshin Hanafi [email protected] 1

University of Birjand, Birjand, South Khorasan, Iran

13

Vol.:(0123456789)

N. Gholizadeh et al.

researchers and industries reconsider computational solutions for the analysis of big data. For instance, great emphasis has been put on the design of new algorithms, which are more efficient in computation, for the analysis of data on Twitter, Google, Facebook, and Wikipedia [2]. This enormous amount of data can be very useful for individuals and companies; however, analysis and recovery operations can become too time-consuming because of the high computational costs of data processing. A category of common methods for data analysis is referred to as data mining which means the identification of useful, reliable, simple, and understandable data patterns turning raw data into useful data or information [3]. One of the data mining techniques is clustering. Data clustering is considered an important area of unsupervised learning in which data can be divided into different groups based on their similarities from an informed perspective on the entire dataset [4]. Clustering is used in a wide range of areas such as vehicle re-identification [5], image denoising [6], time-series processing [7], and Web-ba

Data Loading...

K-DBSCAN: An improved DBSCAN algorithm for big data

Recommend Documents

An Improved LMDS Algorithm

An Adaptive Parameters Density Cluster Algorithm for Data Cleaning in Big Data

An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering

An Improved Algorithm for AC Impedance Extraction

An Improved Adaptive Genetic Algorithm

A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Road Traffic Injury Prevention Using DBSCAN Algorithm

An improved query optimization process in big data using ACO-GA algorithm and HDFS map reduce technique

An Architecture for Data Warehousing in Big Data Environments

Improved Clustering for Categorical Data with Genetic Algorithm

Firefly algorithm: an optimization solution in big data processing for the healthcare and engineering sector