Two-Stage Game Strategy for Multiclass Imbalanced Data Online Prediction

  • PDF / 1,000,173 Bytes
  • 20 Pages / 439.37 x 666.142 pts Page_size
  • 23 Downloads / 201 Views

DOWNLOAD

REPORT


Two-Stage Game Strategy for Multiclass Imbalanced Data Online Prediction Haiyang Yu1 · Chunyi Chen1 · Huamin Yang1 Accepted: 21 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract For multiclass imbalanced data online prediction, how to design a self-adapted model is a challenging problem. To address this issue, a novel dynamic multi-classification algorithm which uses two-stage game strategy has been put forward. Different from typical imbalanced classification methods, the proposed approach provided a self-updating model quantificationally, which can match the changes of arriving sample chunk automatically. In data generation phase, two dynamic ELMs with game theory are utilized for generating the lifelike minority class to equilibrate the distribution of different samples. In model update phase, both the current prediction performance and the cost sensitivity are taken into consideration simultaneously. According to the suffer loss and the shifty imbalance ratio, the proposed method develops the relationship between new weight and individual model, and an aggregate model of game theory is adopted to calculate the combination weight. These strategies help the algorithm reduce fitting error of sequence fragments. Also, alterative hidden-layer output matrix can be calculated according to the current fragment, thus building the steady network architecture in the next chunk. Numerical experiments are conducted on eight multiclass UCI datasets. The results demonstrate that the proposed algorithm not only has better generalization performance, but also improves the predictive ability of ELM method for minority samples. Keywords Online prediction · Multiclass imbalanced data · Dynamic ELM · Game theory

1 Introduction There has been an increasing interest in online prediction of imbalanced data over the last years. The imbalanced data is pervasive in applications such as image recognition [1], behavior detection [2]. spam filtering [3], and disease diagnosis [4]. Usually, the erroneous cost of minority class sample is greater than that of majority ones [5]. For example, if we falsely regard the healthy people as the patients, it is a waste of medical resources at most. On the

B 1

Haiyang Yu [email protected] School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, Jilin, China

123

H. Yu et al.

contrary, the disease treatment would be delayed, even lead to the disastrous consequences. The model which ignores the cost of minority class is invalid, although it may get the appreciable classification performance [6]. Furthermore, different from offline process, the online scenario causes the shifty data distribution and new sample categories [7], so the learning frame must dispose the data stream one by one or chunk by chunk. Especially in Big Data condition, nonstationary time series will reduce the model adaptability [8]. Interactive feature extraction and iterative parameter estimation lead to the relearning process. If t