Do code review measures explain the incidence of post-release defects?

  • PDF / 1,556,876 Bytes
  • 34 Pages / 439.642 x 666.49 pts Page_size
  • 42 Downloads / 203 Views

DOWNLOAD

REPORT


Do code review measures explain the incidence of post-release defects? Case study replications and bayesian networks Andrey Krutauz1 · Tapajit Dey2 · Peter C. Rigby1

· Audris Mockus2

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Aim In contrast to studies of defects found during code review, we aim to clarify whether code review measures can explain the prevalence of post-release defects. Method We replicate McIntosh et al.’s (Empirical Softw. Engg. 21(5): 2146–2189, 2016) study that uses additive regression to model the relationship between defects and code reviews. To increase external validity, we apply the same methodology on a new software project. We discuss our findings with the first author of the original study, McIntosh. We then investigate how to reduce the impact of correlated predictors in the variable selection process and how to increase understanding of the inter-relationships among the predictors by employing Bayesian Network (BN) models. Context As in the original study, we use the same measures authors obtained for Qt project in the original study. We mine data from version control and issue tracker of Google Chrome and operationalize measures that are close analogs to the large collection of code, process, and code review measures used in the replicated the study. Results Both the data from the original study and the Chrome data showed high instability of the influence of code review measures on defects with the results being highly sensitive to variable selection procedure. Models without code review predictors had as good or better fit than those with review predictors. Replication, however, confirms with the bulk of prior work showing that prior defects, module size, and authorship have the strongest relationship to post-release defects. The application of BN models helped explain the observed instability by demonstrating that the review-related predictors do not affect post-release defects directly and showed indirect effects. For example, changes that have no review discussion tend to be associated with files that have had many prior defects which in turn increase the number of post-release defects. We hope that similar analyses of other software engineering techniques may also yield a more nuanced view of their impact. Our replication package including our data and scripts is publicly available (Krutauz et al. 2020). Communicated by: Tim Menzies  Peter C. Rigby

[email protected]

Extended author information available on the last page of the article.

Empirical Software Engineering

Keywords Code review measures · Statistical models · Bayesian networks

1 Introduction For decades code review has been seen as a cornerstone of quality assurance for software projects. The process evolved from a formal process with checklists and face to face meetings (Fagan 2002) to a lightweight and semi-formal review done via e-mails or specially designed collaboration tools (Rigby and Storey 2011). The lightweight code review approach was originally used in open