Constructing accuracy and diversity ensemble using Pareto-based multi-objective learning for evolving data streams

  • PDF / 1,578,477 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 18 Downloads / 183 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

Constructing accuracy and diversity ensemble using Pareto-based multi-objective learning for evolving data streams Yange Sun1,2



Honghua Dai3,4

Received: 1 April 2020 / Accepted: 24 September 2020  Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Ensemble learning is one of the most frequently used techniques for handling concept drift, which is the greatest challenge for learning high-performance models from big evolving data streams. In this paper, a Pareto-based multi-objective optimization technique is introduced to learn high-performance base classifiers. Based on this technique, a multi-objective evolutionary ensemble learning scheme, named Pareto-optimal ensemble for a better accuracy and diversity (PAD), is proposed. The approach aims to enhance the generalization ability of ensemble in evolving data stream environment by balancing the accuracy and diversity of ensemble members. In addition, an adaptive window change detection mechanism is designed for tracking different kinds of drifts constantly. Extensive experiments show that PAD is capable of adapting to dynamic change environments effectively and efficiently in achieving better performance. Keywords Data streams  Concept drift  Ensemble learning  Diversity  Classifier selection  Multi-objective optimization

1 Introduction As the most important form of big data, data stream can represent the characteristics of the velocity in big data 5Vs, i.e., volume, velocity, variety, veracity and value. Specifically, a data stream refers to a sequence of unbounded, real time of instances that arrive continuously with a high data rate and fast evolving behavior. Data stream classification has gained increasing attention in big data mining due to its broad range of real-world applications, including credit

& Yange Sun [email protected] 1

School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, People’s Republic of China

2

Henan Key Lab of Analysis and Applications of Education Big Data, Xinyang Normal University, Xinyang 464000, People’s Republic of China

3

Institute of Intelligent Systems and Renovation, Deakin University, Waurn Ponds, VIC 3216, Australia

4

Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou, People’s Republic of China

card fraud detection, spam filtering, intrusion detection and data analysis in Internet of Things (IoT) networks [1–5]. The greatest challenge in data streams classification is concept drift in the literature [6–9]. Concept drift refers to the target concepts of data stream which may evolve arbitrarily over time. More specifically, concept drift can be represented by the joint distribution P(X, y), where X is the input attribute vector and y is the target class vector. The occurrence of drifts can deteriorate the classification performance, as the model built on previous data is no longer suitable for the newly arriving data due to the dynamic changes. For example