Constructing accuracy and diversity ensemble using Pareto-based multi-objective learning for evolving data streams

PDF / 1,578,477 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
18 Downloads / 183 Views

(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

Constructing accuracy and diversity ensemble using Pareto-based multi-objective learning for evolving data streams Yange Sun1,2

•

Honghua Dai3,4

Received: 1 April 2020 / Accepted: 24 September 2020 Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Ensemble learning is one of the most frequently used techniques for handling concept drift, which is the greatest challenge for learning high-performance models from big evolving data streams. In this paper, a Pareto-based multi-objective optimization technique is introduced to learn high-performance base classifiers. Based on this technique, a multi-objective evolutionary ensemble learning scheme, named Pareto-optimal ensemble for a better accuracy and diversity (PAD), is proposed. The approach aims to enhance the generalization ability of ensemble in evolving data stream environment by balancing the accuracy and diversity of ensemble members. In addition, an adaptive window change detection mechanism is designed for tracking different kinds of drifts constantly. Extensive experiments show that PAD is capable of adapting to dynamic change environments effectively and efficiently in achieving better performance. Keywords Data streams Concept drift Ensemble learning Diversity Classifier selection Multi-objective optimization

1 Introduction As the most important form of big data, data stream can represent the characteristics of the velocity in big data 5Vs, i.e., volume, velocity, variety, veracity and value. Specifically, a data stream refers to a sequence of unbounded, real time of instances that arrive continuously with a high data rate and fast evolving behavior. Data stream classification has gained increasing attention in big data mining due to its broad range of real-world applications, including credit

& Yange Sun [email protected] 1

School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, People’s Republic of China

2

Henan Key Lab of Analysis and Applications of Education Big Data, Xinyang Normal University, Xinyang 464000, People’s Republic of China

3

Institute of Intelligent Systems and Renovation, Deakin University, Waurn Ponds, VIC 3216, Australia

4

Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou, People’s Republic of China

card fraud detection, spam filtering, intrusion detection and data analysis in Internet of Things (IoT) networks [1–5]. The greatest challenge in data streams classification is concept drift in the literature [6–9]. Concept drift refers to the target concepts of data stream which may evolve arbitrarily over time. More specifically, concept drift can be represented by the joint distribution P(X, y), where X is the input attribute vector and y is the target class vector. The occurrence of drifts can deteriorate the classification performance, as the model built on previous data is no longer suitable for the newly arriving data due to the dynamic changes. For example

Data Loading...

Constructing accuracy and diversity ensemble using Pareto-based multi-objective learning for evolving data streams

Recommend Documents

Feature Drift Detection in Evolving Data Streams

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

Ensemble Learning for Heterogeneous Missing Data Imputation

Learning from Data Streams in Dynamic Environments

Data Streams

Statistical hierarchical clustering algorithm for outlier detection in evolving data streams

A GP-based ensemble classification framework for time-changing streams of intrusion detection data

Transfer Learning and Ensemble Learning

Spatio-Temporal Data Streams

Learning in Streams

Distributed Data Streams

Transforming Data Streams