An Entropy Based Algorithm for Credit Scoring

The request of effective credit scoring models is rising in these last decades, due to the increase of consumer lending. Their objective is to divide the loan applicants into two classes, reliable or unreliable, on the basis of the available information.

  • PDF / 387,418 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 98 Downloads / 230 Views

DOWNLOAD

REPORT


Abstract. The request of effective credit scoring models is rising in these last decades, due to the increase of consumer lending. Their objective is to divide the loan applicants into two classes, reliable or unreliable, on the basis of the available information. The linear discriminant analysis is one of the most common techniques used to define these models, although this simple parametric statistical method does not overcome some problems, the most important of which is the imbalanced distribution of data by classes. It happens since the number of default cases is much smaller than that of non-default ones, a scenario that reduces the effectiveness of the machine learning approaches, e.g., neural networks and random forests. The in Maximum Entropy (DME) approach proposed in this paper leads toward two interesting results: on the one hand, it evaluates the new loan applications in terms of maximum entropy difference between their features and those of the non-default past cases, using for the model training only these last cases, overcoming the imbalanced learning issue; on the other hand, it operates proactively, overcoming the cold-start problem. Our model has been evaluated by using two real-world datasets with an imbalanced distribution of data, comparing its performance to that of the most performing state-of-the-art approach: random forests. Keywords: Business intelligence Classification

1

·

Credit scoring

·

Data mining

·

Introduction

The processes taken into account in this paper typically start with a loan application (from now on named as instance) and end with a repayment (or not repayment) of the loan. Although the retail lending represents one of the most profitable source of income for the financial operators, the increase of loans is directly related to the increase of the number of defaulted cases, i.e., fully or partially not repaid loans. In short, the credit scoring is used to classify, on the basis of the available information, the loan applicants into two classes, reliable or unreliable (or better, referring to their instances, accepted or rejected ). Considering its capability to reduce the losses of money, it is clear that it represents c IFIP International Federation for Information Processing 2016  Published by Springer International Publishing AG 2016. All Rights Reserved A.M. Tjoa et al. (Eds.): CONFENIS 2016, LNBIP 268, pp. 263–276, 2016. DOI: 10.1007/978-3-319-49944-4 20

264

R. Saia and S. Carta

an important tool, as stated in [1]. More formally, the credit scoring techniques can be defined as a group of statistical methods used to infer the probability that an instance leads toward a default [2,3]. Whereas that their processes involve all the factors that contribute to determine the credit risk [4] (i.e., probability of loss from a debtor’s default), they allow the financial operators to evaluate this aspect. Other advantages related to these techniques are the reduction of the credit analysis cost, a quick response time in the credit decisions, and the possibility to accurately