On the Online Classification of Data Streams Using Weak Estimators

In this paper, we propose a novel online classifier for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model and counters to keep important data statistics, the introduced online clas

  • PDF / 184,050 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 68 Downloads / 207 Views

DOWNLOAD

REPORT


Department of Computer Science, Oslo and Akershus University College of Applied Sciences, Oslo, Norway 2 School of Computer Science, Carleton University, Ottawa, Canada [email protected]

Abstract. In this paper, we propose a novel online classifier for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model and counters to keep important data statistics, the introduced online classifier scheme provides a real-time self-adjusting learning model. The learning model utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new element is inserted, without requiring that we have to rebuild its model when changes occur in the data distributions. Finally, and most importantly, the model operates with the understanding that the correct classes of previously-classified patterns become available at a later juncture subsequent to some time instances, thus requiring us to update the training set and the training model. The results obtained from rigorous empirical analysis on multinomial distributions, is remarkable. Indeed, it demonstrates the applicability of our method on synthetic datasets, and proves the advantages of the introduced scheme. Keywords: Weak estimators · Learning automata environments · Classification in data streams

1

·

Non-stationary

Introduction

In the past few years, due to the advances in computer hardware technology, large amounts of data have been generated and collected and are stored permanently from different sources. Some the applications that generate data streams are financial tickers, log records or click-streams in web tracking and personalization, data feeds from sensor applications and call detail records in telecommunications. Analyzing these huge amounts of data has been one of the most important challenges in the field of Machine Learning (ML) and Pattern Recognition (PR). Traditionally, ML methods are assumed to deal with static data B.J. Oommen—Chancellor’s Professor; Fellow: IEEE and Fellow: IAPR. This author is also an Adjunct Professor with the University of Agder in Grimstad, Norway. c Springer International Publishing Switzerland 2016  H. Fujita et al. (Eds.): IEA/AIE 2016, LNAI 9799, pp. 68–79, 2016. DOI: 10.1007/978-3-319-42007-3 7

On the Online Classification of Data Streams Using Weak Estimators

69

stored in memory, which can be read several times. On the contrary, streaming data grows at an unlimited rate and arrives continuously in a single-pass manner that can be read only once. Further, there are space and time restrictions in analyzing streaming data. Consequently, one needs methods that are “automatically adapted” to update the training models based on the information gathered over the past observations whenever a change in the data is detected. Mining streaming data is constrained by limited resources of time and memory. Since the source of data

Data Loading...