A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

PDF / 4,867,431 Bytes
25 Pages / 595.276 x 790.866 pts Page_size
71 Downloads / 314 Views

ORIGINAL ARTICLE

A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority Parneeta Sidhu • M. P. S. Bhatia

Received: 15 July 2014 / Accepted: 16 January 2015 Springer-Verlag Berlin Heidelberg 2015

Abstract We present an online ensemble approach, diversified dynamic weighted majority (DDWM) to classify new data instances which have varying conceptual distributions. Our approach maintains two sets of weighted ensembles that differentiate in their level of diversity. An expert in either of the ensembles is updated or removed as per its classification accuracy and a new expert is added based on the final global prediction of the algorithm and the global prediction of the ensemble for any data instance. Experimental evaluation using various artificial and realworld datasets proves that DDWM provides very high accuracy in classifying new data instances, irrespective of size of dataset, type of drift or presence of noise. We compare DDWM with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment. Keywords Concept drift Ensemble Diversity Data stream Online learning

1 Introduction Data stream mining is a very important research area in machine learning community. It is the process of studying P. Sidhu (&) M. P. S. Bhatia Division of CoE, Netaji Subhas Institute of Technology, Sec-3 Dwarka, New Delhi 110078, India e-mail: [email protected] M. P. S. Bhatia e-mail: [email protected]

the concept underlying the data and the variations in that concept to classify new data instances with higher accuracy. Data streams differ from the static databases as they may have varying concepts underlying the data, unlimited size, high speed and high dimensionality [52]. We can access a data instance in a data stream only ‘‘once’’ when it arrives, after that the given instance is replaced by a new instance which may have a different conceptual distribution. ‘Concept’ for a data instance refers to the underlying data distribution, illustrated by the joint distribution [1], p(x, y) where x represents the n-dimensional feature vector and y represents its class label. The term ‘concept drift’ refers to change in the underlying conceptual distribution [6, 7, 15] as new instances arrive for example in various applications like Market-Basket analysis [10], computer security, internet data, credit fraud detection, bioinformatics etc. In Market-Basket analysis, similar concept is seen in the customer buying behavior each year during Christmas festivity. This pattern re-occurs every year (i.e. recurrent drift), resulting in a drift from the customer’s last month buying pattern. A drift present in a dataset is measured by its severity and speed. Severity represents the amount of changes caused by a new concept. Speed is the inverse of the time taken for a new concept

Data Loading...

A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

Recommend Documents

A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

MASDES-DWMV: Model for Dynamic Ensemble Selection Based on Multiagent System and Dynamic Weighted Majority Voting

Kappa Updated Ensemble for drifting data stream mining

A Pragmatic Business Approach to a Novel C5 Concept ATMAN

Learning from Data Streams in Dynamic Environments

Towards a Universal Classifier for Crystallographic Space Groups: A Trickle-Down Approach to Handle Data Imbalance

Ensemble convolutional neural networks with weighted majority for wafer bin map pattern classification

Concept Drift Detection Using Autoencoders in Data Streams Processing

A weighted ensemble-based active learning model to label microarray data

A novel technique: ensemble hybrid 1NN model using stacking approach

Online Analysis of High-Volume Data Streams in Astroparticle Physics

On the Online Classification of Data Streams Using Weak Estimators