Drift Detection Using Stream Volatility

Current methods in data streams that detect concept drifts in the underlying distribution of data look at the distribution difference using statistical measures based on mean and variance. Existing methods are unable to proactively approximate the probabi

PDF / 1,117,651 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
24 Downloads / 265 Views

DOWNLOAD

REPORT

Department of Computer Science, University of Auckland, Auckland, New Zealand {dtjh,ykoh,gill}@cs.auckland.ac.nz 2 Huawei Noah’s Ark Lab, Hong Kong, China [email protected]

Abstract. Current methods in data streams that detect concept drifts in the underlying distribution of data look at the distribution diﬀerence using statistical measures based on mean and variance. Existing methods are unable to proactively approximate the probability of a concept drift occurring and predict future drift points. We extend the current drift detection design by proposing the use of historical drift trends to estimate the probability of expecting a drift at diﬀerent points across the stream, which we term the expected drift probability. We oﬀer empirical evidence that applying our expected drift probability with the state-ofthe-art drift detector, ADWIN, we can improve the detection performance of ADWIN by signiﬁcantly reducing the false positive rate. To the best of our knowledge, this is the ﬁrst work that investigates this idea. We also show that our overall concept can be easily incorporated back onto incremental classiﬁers such as VFDT and demonstrate that the performance of the classiﬁer is further improved. Keywords: Data stream

1

· Drift detection · Stream volatility

Introduction

Mining data that change over time from fast changing data streams has become a core research problem. Drift detection discovers important distribution changes from labeled classiﬁcation streams and many drift detectors have been proposed [1,5,8,10]. A drift is signaled when the monitored classiﬁcation error deviates from its usual value past a certain detection threshold, calculated from a statistical upper bound [6] or a signiﬁcance technique [9]. The current drift detectors monitor only some form of mean and variance of the classiﬁcation errors and these errors are used as the only basis for signaling drifts. Currently the detectors do not consider any previous trends in data or drift behaviors. Our proposal incorporates previous drift trends to extend and improve the current drift detection process. In practice there are many scenarios such as traﬃc prediction where incorporating previous data trends can improve the accuracy of the prediction process. For example, consider a user using Google Map at home to obtain a fastest route to a speciﬁc location. The fastest route given by the system will be based on c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part I, LNAI 9284, pp. 417–432, 2015. DOI: 10.1007/978-3-319-23528-8 26

418

D.T.J. Huang et al.

how congested the roads are at the current time (prior to leaving home) but is unable to adapt to situations like upcoming peak hour traﬃc. The user could be directed to take the main road that is not congested at the time of look up, but may later become congested due to peak hour traﬃc when the user is en route. In this example, combining data such as traﬃc trends throughout the day can help arrive at a better prediction. Similarly, using

Data Loading...

Drift Detection Using Stream Volatility

Recommend Documents

Concept Drift Detection Using Autoencoders in Data Streams Processing

Particle Detection with Drift Chambers

Stream gauge network grouping analysis using community detection

Feature Drift Detection in Evolving Data Streams

Volatility

Joint estimation for volatility and drift parameters of ergodic jump diffusion processes via contrast function

Drift Detection in Selective Laser Melting (SLM) Using a Machine Learning Approach

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

Quantifying Temporal Novelty in Social Networks Using Time-Varying Graphs and Concept Drift Detection

Tantalum-Nitride Diffusion Barrier Studies Using the Transient-Ion-Drift Technique for Copper Detection

Sensor Drift Compensation Using Robust Classification Method

Continental Drift