Generative adversarial fusion network for class imbalance credit scoring

  • PDF / 817,933 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 3 Downloads / 332 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

Generative adversarial fusion network for class imbalance credit scoring Kai Lei1,2 • Yuexiang Xie1,2 • Shangru Zhong1 • Jingchao Dai1,2 • Min Yang3 • Ying Shen1 Received: 23 March 2019 / Accepted: 28 June 2019  Springer-Verlag London Ltd., part of Springer Nature 2019

Abstract Credit scoring on class imbalance data, where the class of defaulters is insufficiently represented compared with the class of non-defaulters, is an important but challenging task. In this paper, we propose an imbalanced generative adversarial fusion network (IGAFN) to cope with the class imbalance credit scoring based on multi-source heterogeneous credit data. Concretely, we design a fusion module to integrate the heterogeneous credit data from multiple sources into a unified latent feature space. A generative adversarial network-based balance module is then designed to generate latent representations of new samples for the minority class of the imbalanced datasets. The performance of IGAFN is compared against multiple conventional machine learning and deep learning algorithms. Extensive experiments show that the proposed IGAFN exhibits significantly better performance than the compared methods on two real-life datasets. Keywords Credit scoring  Class imbalance  Generative adversarial network  Feature fusion

1 Introduction Credit scoring (or credit rating) systems aim to automatically judge whether an application of credit should be approved or rejected to decrease credit risk and reduce bad loans. It has attracted increasing attention recently due to its broad applications in banks and other financial institutions [12]. Most prior work establishes credit scoring classifier with traditional machine learning approaches such as support vector machine [5, 21, 30], decision tree [2, 32, 38] and logistic regression [10, 20, 22]. Inspired by the success of deep learning in computer vision and natural language processing, several recent studies employ deep & Ying Shen [email protected] 1

Shenzhen Key Lab for Information Centric Networking & Blockchain Technology (ICNLAB), School of Electronics and Computer Engineering (SECE), Peking University, Shenzhen 518055, People’s Republic of China

2

PCL Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen, People’s Republic of China

3

Shenzhen Institutes of Advanced Technology (SIAT), University of the Chinese Academy of Sciences, Beijing, People’s Republic of China

learning algorithms such as convolutional neural network [23] and restricted Boltzmann machine [35] for credit scoring. Although remarkable progress has been made by previous methods, credit scoring remains a challenge in reallife applications for two reasons: 1. The credit scoring data are usually a mixture of structured and semi-structured data, which are called multi-source heterogeneous data. We can divide the credit scoring data into two types: user profile data (i.e., gender, occupation and education) and time-based user behavior data