Automated Determination of the Input Parameter of DBSCAN Based on Outlier Detection

During the last two decades, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has been one of the most common clustering algorithms, that is also highly cited in the scientific literature. However, despite its strengths, DBSCAN has a s

PDF / 2,511,520 Bytes
12 Pages / 439.37 x 666.14 pts Page_size
80 Downloads / 185 Views

DOWNLOAD

REPORT

)

Institute for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen, Essen, Germany {zohreh.akbari,rainer.unland}@icb.uni-due.de

Abstract. During the last two decades, DBSCAN (Density-Based Spatial Clus‐ tering of Applications with Noise) has been one of the most common clustering algorithms, that is also highly cited in the scientiﬁc literature. However, despite its strengths, DBSCAN has a shortcoming in parameter detection, which is done in interaction with the user, presenting some graphical representation of the data. This paper introduces a simple and eﬀective method for automatically deter‐ mining the input parameter of DBSCAN. The idea is based on a statistical tech‐ nique for outlier detection, namely the empirical rule. This work also suggests a more accurate method for detecting the clusters that lie close to each other. Experimental results in comparison with the old method, together with the time complexity of the algorithm, which is the same as for the old algorithm, indicate that the proposed method is able to automatically determine the input parameter of DBSCAN quite reliably and eﬃciently. Keywords: Clustering · DBSCAN · Empirical rule · Machine learning · Outlier detection · Parameter determination · Unsupervised learning

1

Introduction

Machine Learning (ML) is one of the core ﬁelds of Artiﬁcial Intelligence (AI) and is concerned with the question of how to construct computer programs that automatically improve with experience [1]. Depending on the nature of the learning data available to the learning system, machine learning methods are typically classiﬁed into three main categories [2, 3]: supervised, unsupervised and reinforcement learning. In supervised learning example inputs and their desired outputs are given and the goal is to learn a general rule that maps these inputs to their desired outputs. In unsupervisaed learning, on the other hand, no labels are given to the learning algorithm, leaving it on its own to ﬁnd the hidden structure of the data, e.g. to look for the similarities between the data instances (i.e. clustering [4]), or to discover the dependencies between the variables in large databases (i.e. association rule mining [5]). In reinforcement learning the desired input/output pairs are again not presented, however, the algorithm is able to estimate the optimal actions by interacting with a dynamic environment and based on the outcomes

© IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 280–291, 2016. DOI: 10.1007/978-3-319-44944-9_24

Automated Determination of the Input Parameter

281

of the more recent actions, while ignoring experiences from the past, that were not reinforced recently. This research focuses on the most common unsupervised learning method (i.e. cluster analysis [4, 6]), and more speciﬁcally on one of its successful algorithms the Density-Based Spatial Clust

Data Loading...

Automated Determination of the Input Parameter of DBSCAN Based on Outlier Detection

Recommend Documents

Outlier Detection

Abstraction-Based Outlier Detection for Image Data

Handcrafted Outlier Detection Revisited

Fair Outlier Detection

Outlier Detection, Spatial

Study on Statistical Outlier Detection and Labelling

An Optimized Approach of Outlier Detection Algorithm for Outlier Attributes on Data Streams

Randomized outlier detection with trees

Lightweight Classifier-Based Outlier Detection Algorithms from Multivariate Data Stream

Outlier Detection for Data Using Density-Based Technique

Correction to: Fair Outlier Detection

UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams