Genetic programming for high-dimensional imbalanced classification with a new fitness function and program reuse mechani

  • PDF / 676,399 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 21 Downloads / 207 Views

DOWNLOAD

REPORT


METHODOLOGIES AND APPLICATION

Genetic programming for high-dimensional imbalanced classification with a new fitness function and program reuse mechanism Wenbin Pei1

· Bing Xue1 · Lin Shang2 · Mengjie Zhang1

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Genetic programming (GP) has been successfully applied to classification. However, GP may evolve biased classifiers when encountering the problem of class imbalance. These biased classifiers are often not reliable to be applied to some realworld applications. High dimensionality makes it more difficult for classifiers to effectively separate the majority class and the minority class. The use of GP to handle the joint effect of high dimensionality and class imbalance has not been heavily investigated. In this paper, we propose a GP approach to high-dimensional imbalanced classification, with the goals of increasing the classification performance as well as saving training time. To achieve this goal, a new fitness function is developed to solve the problem of class imbalance, and moreover, a strategy is proposed to reuse previous good GP individuals for improving efficiency. The proposed method is examined on ten high-dimensional imbalanced datasets. Experimental results show that, for high-dimensional imbalanced classification, the proposed method generally outperforms other GP methods and traditional classification algorithms using sampling methods to solve the problem of class imbalance. Keywords Genetic programming · Fitness function · Class imbalance · High dimensionality

1 Introduction Genetic programming (GP) (Poli et al. 2008) automatically generates computer programs that are often represented as trees. Classification, a common supervised learning task, refers to a procedure to assign a given instance into its corresponding category or class (Tan et al. 2016). GP has been successfully applied to feature selection and feature construction for addressing the curse of dimensionality issue Communicated by V. Loia.

B

Wenbin Pei [email protected] Bing Xue [email protected] Lin Shang [email protected] Mengjie Zhang [email protected]

1

School of Engineering and Computer Science, Victoria University of Wellington, P.O. Box 600, Wellington 6140, New Zealand

2

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China

for many classification algorithms in machine learning (Tran et al. 2016). More importantly, GP can directly construct classifiers (Espejo et al. 2010; Luna et al. 2017). However, GP may develop the biased classifiers in imbalanced classification if the problem of class imbalance is not well-addressed (Bhowan et al. 2012). Class imbalance is a common issue in some domains, such as fraud detection, medical diagnosis, financial analysis of loan policy or bankruptcy, and text classification (Batista et al. 2004; Chawla et al. 2004). Learning from imbalanced data, not only GP methods, many classification algorithms, e.g. support vector machines (SVMs) and decision tre