On the assessment of software defect prediction models via ROC curves

PDF / 1,805,837 Bytes
43 Pages / 439.642 x 666.49 pts Page_size
1 Downloads / 200 Views

On the assessment of software defect prediction models via ROC curves Sandro Morasca1

· Luigi Lavazza1

Published online: 19 August 2020 © The Author(s) 2020

Abstract Software defect prediction models are classifiers often built by setting a threshold t on a defect proneness model, i.e., a scoring function. For instance, they classify a software module non-faulty if its defect proneness is below t and positive otherwise. Different values of t may lead to different defect prediction models, possibly with very different performance levels. Receiver Operating Characteristic (ROC) curves provide an overall assessment of a defect proneness model, by taking into account all possible values of t and thus all defect prediction models that can be built based on it. However, using a defect proneness model with a value of t is sensible only if the resulting defect prediction model has a performance that is at least as good as some minimal performance level that depends on practitioners’ and researchers’ goals and needs. We introduce a new approach and a new performance metric (the Ratio of Relevant Areas) for assessing a defect proneness model by taking into account only the parts of a ROC curve corresponding to values of t for which defect proneness models have higher performance than some reference value. We provide the practical motivations and theoretical underpinnings for our approach, by: 1) showing how it addresses the shortcomings of existing performance metrics like the Area Under the Curve and Gini’s coefficient; 2) deriving reference values based on random defect prediction policies, in addition to deterministic ones; 3) showing how the approach works with several performance metrics (e.g., Precision and Recall) and their combinations; 4) studying misclassification costs and providing a general upper bound for the cost related to the use of any defect proneness model; 5) showing the relationships between misclassification costs and performance metrics. We also carried out a comprehensive empirical study on real-life data from the SEACRAFT repository, to show the differences between our metric and the existing ones and how more reliable and less misleading our metric can be. Keywords Software defect prediction model · Software defect proneness · ROC · Thresholds · AUC · Gini

Communicated by: Martin Shepperd Sandro Morasca

[email protected]

Extended author information available on the last page of the article.

3978

Empirical Software Engineering (2020) 25:3977–4019

1 Introduction Accurate estimation of which modules are faulty in a software system can be very useful to software practitioners and researchers. Practitioners can efficiently allocate scarce resources if they can predict which modules may need to undergo more extensive Verification and Validation than others. Researchers need to use quantitative, accurate module defect prediction techniques so they can assess and subsequently improve software development methods. In this paper, by the term “module,” we denote any piece of software (e.g

Data Loading...

On the assessment of software defect prediction models via ROC curves

Recommend Documents

Strengths and Weaknesses of Three Software Programs for the Comparison of Systems Based on ROC Curves

An Application of MRMC ROC Curves on Radiology

Software Defect Prediction with Spiking Neural Networks

SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction

Cost-sensitive Dictionary Learning for Software Defect Prediction

On the time-based conclusion stability of cross-project defect prediction models

Cross-Project Software Defect Prediction Based on Feature Selection and Transfer Learning

Software Defect Prediction Based on Selected Features Using Neural Network and Decision Tree

Visualizing the decision rules behind the ROC curves: understanding the classification process

ROC

ROC Curve in GAMLSS as Prediction Tool for Big Data

Software Reliability Growth Models