Performance of recurrent event models on defect proneness data

  • PDF / 440,257 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 2 Downloads / 197 Views

DOWNLOAD

REPORT


Performance of recurrent event models on defect proneness data M. K. Lintu1

· Asha Kamath1

Accepted: 19 November 2020 © The Author(s) 2020

Abstract The repeated occurrence of the same event in a process is commonly observed in many domains. Such events are referred to as recurrent events. The time to occurrence of these repeated events varies from unit to unit with a possibility of events not occurring among some of the units. Invariably such data are dealt with using some of the techniques in survival analysis called recurrent event models, which are commonly encountered in epidemiological studies and clinical trials. However, it applies to other domains in science and technology. We illustrate the usefulness of recurrent event models in the context of defect proneness analysis in quality assessment of software. Some of the models in practice are introduced on data collected to study the impact of module size on defect proneness in the Mozilla product. Module size plays a significant role in defect proneness and each defect fix makes the class more susceptible to further defects. The risk estimates obtained from the different models vary owing to the differences in the properties of the models as well as the assumptions underlying it. Keywords Recurrent events · Survival model · Extended Cox models · Defect proneness Mathematics Subject Classification 62N01 · 62N05

1 Introduction Survival analysis builds a relationship between time to occurrence of an event to the covariates that influence it. This finds a lot of applications in medicine, economics, finance, and engineering. However, the likelihood of observing more than one event during the study period is very common regardless of the field of application and they are called recurrent or multiple events. The challenge underlying this problem is that the probability of one event is likely to get influenced by the earlier event even if they are of a different type.

B

Asha Kamath [email protected] M. K. Lintu [email protected]

1

Department of Data Science, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India

123

Annals of Operations Research

Cox proportional hazards model (Cox 1972), a well-known approach for the analysis of survival data, has been extended to multiple events as well as events with non-proportional hazards. But the problem in extending the proportional hazards model to multiple events is the intra-subject correlation (Therneau and Grambsch 2013; Hougaard 2012). Some extensions of the Cox proportional model have been proposed for analysis, which belongs to the class of intensity processes such as Andersen and Gill model (AG) (Andersen and Gill 1982), Prentice, Williams, and Peterson models (PWP) (Prentice et al. 1981) and Wei, Lin and Weissfeld model (WLW) (Wei et al. 1989) to address the intrasubject correlation. Different types of frailty models (Vaupel et al. 1979), as well as multi-state models (MSM) (Hougaard 1999), are also popular in use for multiple event problems. The choice among the different