A survey of density based clustering algorithms

  • PDF / 965,752 Bytes
  • 27 Pages / 612.284 x 802.205 pts Page_size
  • 64 Downloads / 262 Views

DOWNLOAD

REPORT


A survey of density based clustering algorithms Panthadeep BHATTACHARJEE

, Pinaki MITRA

Department of Computer Science and Engineering, Indian Institute of Technology, Guwahati 781039, India c Higher Education Press 2020 

Abstract Density based clustering algorithms (DBCLAs) rely on the notion of density to identify clusters of arbitrary shapes, sizes with varying densities. Existing surveys on DBCLAs cover only a selected set of algorithms. These surveys fail to provide an extensive information about a variety of DBCLAs proposed till date including a taxonomy of the algorithms. In this paper we present a comprehensive survey of various DBCLAs over last two decades along with their classification. We group the DBCLAs in each of the four categories: density definition, parameter sensitivity, execution mode and nature of data and further divide them into various classes under each of these categories. In addition, we compare the DBCLAs through their common features and variations in citation and conceptual dependencies. We identify various application areas of DBCLAs in domains such as astronomy, earth sciences, molecular biology, geography, multimedia. Our survey also identifies probable future directions of DBCLAs where involvement of density based methods may lead to favorable results. Keywords clustering, density based clustering, survey, classification, common properties, applications

1

Introduction

Clustering is an unsupervised learning task that groups data objects or patterns based on similarity measures. Such objects may exist as data points in a Rd space. Entities belonging to a certain cluster have greater similarity between them than with an entity belonging to a different cluster [1–3]. Cluster analysis is done with the objective of summarization or improved understanding of the data in context, e.g., grouping of related documents for browsing, finding protein structures and genes having analogous functions, or as a technique to compress data [4]. A large number of clustering techniques have been developed for pattern analysis, grouping, decision making, document retrieval, image segmentation, data mining, yet many significant challenges still remain in determining the clusters correctly. Clustering approaches are broadly classified into partitional, hierarchical and density based methods (Refer to Fig. 1) [1]. Partitional method creates partition of the data instead of a clustering structure. The partitional clustering approach involves squared error method, e.g., K-means algorithm, graph theoretic clustering, mixture resolving, e.g., EM algorithm and mode Received February 17, 2019; accepted September 9, 2019 E-mail: [email protected]; [email protected]

seeking method [1]. Hierarchical clustering produces a dendrogram that represents the nested grouping of patterns, e.g., Chameleon [5]. Hierarchical method adopts agglomerative or divisive approach to determine the clustering. Density based clustering depends on the notion of finding density of a region. The objective of DBCLAs is to find clu