Combination of Active and Random Labeling Strategy in the Non-stationary Data Stream Classification

A significant problem when building classifiers based on data stream is information about the correct label. Most algorithms assume access to this information without any restrictions. Unfortunately, this is not possible in practice because the objects ca

PDF / 778,870 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
27 Downloads / 217 Views

DOWNLOAD

REPORT

bstract. A signiﬁcant problem when building classiﬁers based on data stream is information about the correct label. Most algorithms assume access to this information without any restrictions. Unfortunately, this is not possible in practice because the objects can come very quickly and labeling all of them is impossible, or we have to pay for providing the correct label (e.g., to human expert). Hence, methods based on partially labeled data, including methods based on an active learning approach, are becoming increasingly popular, i.e., when the learning algorithm itself decides which of the objects are interesting to improve the quality of the predictive model eﬀectively. In this paper, we propose a new method of active learning of data stream classiﬁer. Its quality has been compared with benchmark solutions based on a large number of test streams, and the results obtained prove the usefulness of the proposed method, especially in the case of a low budget dedicated to the labeling of incoming objects. Keywords: Data stream classiﬁcation · Active learning · Concept drift

1

Introduction

The design of classiﬁers for streaming data is the subject of intensive research because, currently, for most decision tasks, data is arriving continuously [4]. During the construction of such a type of system, we must take into account several vital issues, such as limited both memory and computing resources, which means that not all incoming data can be memorized and that each object can be analyzed at most once [3]. Another diﬃculty encountered in the construction of stream data classiﬁers is the phenomenon called concept drift, which means that when we use and train the classiﬁcation model, the probability characteristics of the classiﬁcation model may change at the same time [5]. Therefore, the classiﬁer dedicated to this type of task, in addition to taking into account the limitations of available computing and memory resources, must ensure a correct response to concept drift. c Springer Nature Switzerland AG 2020 L. Rutkowski et al. (Eds.): ICAISC 2020, LNAI 12415, pp. 576–585, 2020. https://doi.org/10.1007/978-3-030-61401-0_54

Combination of Active and Random Labeling

577

In this work, we will also deal with another critical problem encountered during streaming data analysis, namely access to the correct label for incoming objects. Many of the methods described in the literature ignore this topic, assuming that labels are always available. They ignore the fact that, on the one hand, even if we could label the incoming objects, they can come quickly enough that labeling all of them will be impossible, or they may come around the clock, which strongly hinders such labeling for logistical reasons. On the other hand, the cost of labeling should be also taken into consideration. Sometimes their cost is negligible, e.g., in the case of weather forecasting (we can get a label with a delay, but the cost is only related to the observation and imputing it into the system). However, for most cases, such as medical diagnostics, lab

Data Loading...

Combination of Active and Random Labeling Strategy in the Non-stationary Data Stream Classification

Recommend Documents

Empirical Analysis of Classification Algorithms in Data Stream Mining

Big Visual Data Analysis Scene Classification and Geometric Labeling

Imbalanced Data Stream Classification Using Hybrid Data Preprocessing

Active Suppression of Nonstationary Narrowband Acoustic Disturbances

Research of the Dimension Combination Strategy Model

Data Stream

Two Stream Active Query Suggestion for Active Learning in Connectomics

Classification of Active Centers

Data Preprocessing and Dynamic Ensemble Selection for Imbalanced Data Stream Classification

Stream Data Analysis

Labeling Data Extracted from the Web

Stream Data Mining