On the Online Classification of Data Streams Using Weak Estimators

In this paper, we propose a novel online classifier for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model and counters to keep important data statistics, the introduced online clas

PDF / 184,050 Bytes
12 Pages / 439.37 x 666.142 pts Page_size
68 Downloads / 252 Views

DOWNLOAD

REPORT

Department of Computer Science, Oslo and Akershus University College of Applied Sciences, Oslo, Norway 2 School of Computer Science, Carleton University, Ottawa, Canada [email protected]

Abstract. In this paper, we propose a novel online classiﬁer for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model and counters to keep important data statistics, the introduced online classiﬁer scheme provides a real-time self-adjusting learning model. The learning model utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new element is inserted, without requiring that we have to rebuild its model when changes occur in the data distributions. Finally, and most importantly, the model operates with the understanding that the correct classes of previously-classiﬁed patterns become available at a later juncture subsequent to some time instances, thus requiring us to update the training set and the training model. The results obtained from rigorous empirical analysis on multinomial distributions, is remarkable. Indeed, it demonstrates the applicability of our method on synthetic datasets, and proves the advantages of the introduced scheme. Keywords: Weak estimators · Learning automata environments · Classiﬁcation in data streams

1

·

Non-stationary

Introduction

In the past few years, due to the advances in computer hardware technology, large amounts of data have been generated and collected and are stored permanently from diﬀerent sources. Some the applications that generate data streams are ﬁnancial tickers, log records or click-streams in web tracking and personalization, data feeds from sensor applications and call detail records in telecommunications. Analyzing these huge amounts of data has been one of the most important challenges in the ﬁeld of Machine Learning (ML) and Pattern Recognition (PR). Traditionally, ML methods are assumed to deal with static data B.J. Oommen—Chancellor’s Professor; Fellow: IEEE and Fellow: IAPR. This author is also an Adjunct Professor with the University of Agder in Grimstad, Norway. c Springer International Publishing Switzerland 2016 H. Fujita et al. (Eds.): IEA/AIE 2016, LNAI 9799, pp. 68–79, 2016. DOI: 10.1007/978-3-319-42007-3 7

On the Online Classiﬁcation of Data Streams Using Weak Estimators

69

stored in memory, which can be read several times. On the contrary, streaming data grows at an unlimited rate and arrives continuously in a single-pass manner that can be read only once. Further, there are space and time restrictions in analyzing streaming data. Consequently, one needs methods that are “automatically adapted” to update the training models based on the information gathered over the past observations whenever a change in the data is detected. Mining streaming data is constrained by limited resources of time and memory. Since the source of data

Data Loading...

On the Online Classification of Data Streams Using Weak Estimators

Recommend Documents

Classification of Multi-class Imbalanced Data Streams Using a Dynamic Data-Balancing Technique

Online Analysis of High-Volume Data Streams in Astroparticle Physics

Classification in Streams

Data Streams

Semi-supervised Classification of Data Streams Based on Adaptive Density Peak Clustering

Spatio-Temporal Data Streams

A Study on Imbalanced Data Streams

Analyzing the Quality of Twitter Data Streams

Distributed Data Streams

Transforming Data Streams

Intelligent Analysis of Data Streams

Aggregate Computation over Data Streams