Automated Determination of the Input Parameter of DBSCAN Based on Outlier Detection
During the last two decades, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has been one of the most common clustering algorithms, that is also highly cited in the scientific literature. However, despite its strengths, DBSCAN has a s
- PDF / 2,511,520 Bytes
- 12 Pages / 439.37 x 666.14 pts Page_size
- 80 Downloads / 185 Views
)
Institute for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen, Essen, Germany {zohreh.akbari,rainer.unland}@icb.uni-due.de
Abstract. During the last two decades, DBSCAN (Density-Based Spatial Clus‐ tering of Applications with Noise) has been one of the most common clustering algorithms, that is also highly cited in the scientific literature. However, despite its strengths, DBSCAN has a shortcoming in parameter detection, which is done in interaction with the user, presenting some graphical representation of the data. This paper introduces a simple and effective method for automatically deter‐ mining the input parameter of DBSCAN. The idea is based on a statistical tech‐ nique for outlier detection, namely the empirical rule. This work also suggests a more accurate method for detecting the clusters that lie close to each other. Experimental results in comparison with the old method, together with the time complexity of the algorithm, which is the same as for the old algorithm, indicate that the proposed method is able to automatically determine the input parameter of DBSCAN quite reliably and efficiently. Keywords: Clustering · DBSCAN · Empirical rule · Machine learning · Outlier detection · Parameter determination · Unsupervised learning
1
Introduction
Machine Learning (ML) is one of the core fields of Artificial Intelligence (AI) and is concerned with the question of how to construct computer programs that automatically improve with experience [1]. Depending on the nature of the learning data available to the learning system, machine learning methods are typically classified into three main categories [2, 3]: supervised, unsupervised and reinforcement learning. In supervised learning example inputs and their desired outputs are given and the goal is to learn a general rule that maps these inputs to their desired outputs. In unsupervisaed learning, on the other hand, no labels are given to the learning algorithm, leaving it on its own to find the hidden structure of the data, e.g. to look for the similarities between the data instances (i.e. clustering [4]), or to discover the dependencies between the variables in large databases (i.e. association rule mining [5]). In reinforcement learning the desired input/output pairs are again not presented, however, the algorithm is able to estimate the optimal actions by interacting with a dynamic environment and based on the outcomes
© IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 280–291, 2016. DOI: 10.1007/978-3-319-44944-9_24
Automated Determination of the Input Parameter
281
of the more recent actions, while ignoring experiences from the past, that were not reinforced recently. This research focuses on the most common unsupervised learning method (i.e. cluster analysis [4, 6]), and more specifically on one of its successful algorithms the Density-Based Spatial Clust
Data Loading...