Concept learning using one-class classifiers for implicit drift detection in evolving data streams

  • PDF / 1,204,567 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 18 Downloads / 255 Views

DOWNLOAD

REPORT


Concept learning using one‑class classifiers for implicit drift detection in evolving data streams Ömer Gözüaçık1 · Fazli Can1 

© Springer Nature B.V. 2020

Abstract Data stream mining has become an important research area over the past decade due to the increasing amount of data available today. Sources from various domains generate a near-limitless volume of data in temporal order. Such data are referred to as data streams, and are generally nonstationary as the characteristics of data evolves over time. This phenomenon is called concept drift, and is an issue of great importance in the literature, since it makes models obsolete by decreasing their predictive performance. In the presence of concept drift, it is necessary to adapt to change in data to build more robust and effective classifiers. Drift detectors are designed to run jointly with classification models, updating them when a significant change in data distribution is observed. In this paper, we present an implicit (unsupervised) algorithm called One-Class Drift Detector (OCDD), which uses a one-class learner with a sliding window to detect concept drift. We perform a comprehensive evaluation on mostly recent 17 prevalent concept drift detection methods and an adaptive classifier using 13 datasets. The results show that OCDD outperforms the other methods by producing models with better predictive performance on both real-world and synthetic datasets. Keywords  Concept drift · Data stream · Drift detection · Unlabeled data · Verification latency

This study is partially supported by Scientific and Technological Research Council of Turkey (TÜBİTAK) Grant No. 117E870. * Fazli Can [email protected] Ömer Gözüaçık [email protected] 1



Information Retrieval Group, Computer Engineering Department, Bilkent University, 06800 Ankara, Turkey

13

Vol.:(0123456789)



Ö. Gözüaçık, F. Can

1 Introduction Analyzing streaming data has become an important challenge in data mining as the amount of data being produced has increased over recent years. It is estimated that data produced is in the order of zetta-bytes, and it is growing at around 40% each year (Fan and Bifet 2013). Data streams are referred to as data arriving continuously with a large amount of samples. They are potential sources of valuable information, provided they can be analyzed at the right time (Wares et al. 2019). Data needs to be processed as it arrives, or it is lost due to the limitations of the streaming environment. Beforehand, data streams were generally studied in financial markets (Krawczyk and Woźniak 2015). However, they are now everywhere due to recent developments in personalized technologies (e.g, IoT), turning each individual a data-source (Pariser 2011). There are various analytical approaches developed for solving problems in machine learning, one of them being classification following the idea that data can be generalized (Duda et al. 2012). A predictive function is modeled, mapping features to labels using training data later to be evaluated on test data. The m