Ensembles of Heterogeneous Concept Drift Detectors - Experimental Study
For the contemporary enterprises, possibility of appropriate business decision making on the basis of the knowledge hidden in stored data is the critical success factor. Therefore, the decision support software should take into consideration that data usu
- PDF / 538,204 Bytes
- 12 Pages / 439.37 x 666.142 pts Page_size
- 71 Downloads / 216 Views
Department of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Technology, Wybrze˙ze Wyspia´ nskiego 27, 50-370 Wroclaw, Poland {michal.wozniak,pawel.ksieniewicz,krzysztof.walkowiak}@pwr.edu.pl 2 AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krak´ ow, Poland [email protected]
Abstract. For the contemporary enterprises, possibility of appropriate business decision making on the basis of the knowledge hidden in stored data is the critical success factor. Therefore, the decision support software should take into consideration that data usually comes continuously in the form of so-called data stream, but most of the traditional data analysis methods are not ready to efficiently analyze fast growing amount of the stored records. Additionally, one should also consider phenomenon appearing in data stream called concept drift, which means that the parameters of an using model are changing, what could dramatically decrease the analytical model quality. This work is focusing on the classification task, which is very popular in many practical cases as fraud detection, network security, or medical diagnosis. We propose how to detect the changes in the data stream using combined concept drift detection model. The experimental evaluations confirm its pretty good quality, what encourage us to use it in practical applications.
Keywords: Data stream detector
1
· Concept drift · Pattern classification · Drift
Introduction
The analysis of huge volumes and fast arriving data is recently the focus of intense research, because such methods could build a competitive advantage of a given company. One of the useful approach is the data stream classification, which is employed to solve problems related to discovery client preference changes, spam filtering, fraud detection, and medical diagnosis to enumerate only a few. However, most of the traditional classifier design methods do not take into consideration that: c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved K. Saeed and W. Homenda (Eds.): CISIM 2016, LNCS 9842, pp. 538–549, 2016. DOI: 10.1007/978-3-319-45378-1 48
Ensembles of Heterogeneous Concept Drift Detectors
539
– The statistical dependencies between the observations of the given objects and their classifications could change. – Data can arrive so quick that labeling all records is impossible. This section focuses on the first problem called concept drift [23] and it comes in many forms, depending on the type of change. Appearance of concept drift may potentially cause a significant accuracy deterioration of an exploiting classifier. Therefore, developing positive methods which are able to effectively deal with this phenomena has become an increasing issue. In general, the following approaches may be considered to deal with the above problem. – Frequently rebuilding a model if new data becomes available. It is very expensive and impossible from a practical point of view, especially
Data Loading...