Cost-sensitive Dictionary Learning for Software Defect Prediction

  • PDF / 722,038 Bytes
  • 35 Pages / 439.37 x 666.142 pts Page_size
  • 30 Downloads / 202 Views

DOWNLOAD

REPORT


Cost-sensitive Dictionary Learning for Software Defect Prediction Liang Niu1 · Jianwu Wan1,2

· Hongyuan Wang1 · Kaiwei Zhou1

Accepted: 16 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In recent years, software defect prediction has been recognized as a cost-sensitive learning problem. To deal with the unequal misclassification losses resulted by different classification errors, some cost-sensitive dictionary learning methods have been proposed recently. Generally speaking, these methods usually define the misclassification costs to measure the unequal losses and then propose to minimize the cost-sensitive reconstruction loss by embedding the cost information into the reconstruction function of dictionary learning. Although promising performance has been achieved, their cost-sensitive reconstruction functions are not welldesigned. In addition, no sufficient attentions are paid to the coding coefficients which can also be helpful to reduce the reconstruction loss. To address these issues, this paper proposes a new cost-sensitive reconstruction loss function and introduces an additional cost-sensitive discrimination regularization for the coding coefficients. Both the two terms are jointly optimized in a unified cost-sensitive dictionary learning framework. By doing so, we can achieve the minimum reconstruction loss and thus obtain a more cost-sensitive dictionary for feature encoding of test data. In the experimental part, we have conducted extensive experiments on twenty-five software projects from four benchmark datasets of NASA, AEEEM, ReLink and Jureczko. The results, in comparison with ten state-of-the-art software defect prediction methods, demonstrate the effectiveness of learned cost-sensitive dictionary for software defect prediction. Keywords Software defect prediction · Cost-sensitive · Dictionary learning · Discrimination

This work was supported in part by National Natural Science Foundation of China under Grants 61502058, 61572085 and 61976028.

B

Jianwu Wan [email protected]

1

School of Information Science and Engineering, Changzhou University, Changzhou 213164, Jiangsu, People’s Republic of China

2

School of Civil and Environmental Engineering, Nanyang Technological University, Singapore 639798, Singapore

123

L. Niu et al.

1 Introduction With the rapid increase of software complexity, software defect prediction (SDP) has aroused widespread interest in the field of software engineering [1–3]. Many researches have indicated that the cost of correcting bugs in the delivery phase is 100 times higher than in the earlier phases of software life cycle [4,5]. To improve the efficiency of software development, it is necessary to perform SDP to find out all of the defective modules in the earlier phases of software life cycle and focus on them during software testing. In the early stage of SDP, researchers usually focus on the learning of robust software features from software projects. Many useful metrics, such as lines of code (LOC), McCabe [6], Hal