Learning actionable analytics from multiple software projects

  • PDF / 2,900,351 Bytes
  • 33 Pages / 439.642 x 666.49 pts Page_size
  • 9 Downloads / 202 Views

DOWNLOAD

REPORT


Learning actionable analytics from multiple software projects Rahul Krishna1

· Tim Menzies2

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The current generation of software analytics tools are mostly prediction algorithms (e.g. support vector machines, naive bayes, logistic regression, etc). While prediction is useful, after prediction comes planning about what actions to take in order to improve quality. This research seeks methods that generate demonstrably useful guidance on “what to do” within the context of a specific software project. Specifically, we propose XTREE (for withinproject planning) and BELLTREE (for cross-project planning) to generating plans that can improve software quality. Each such plan has the property that, if followed, it reduces the expected number of future defect reports. To find this expected number, planning was first applied to data from release x. Next, we looked for change in release x + 1 that conformed to our plans. This procedure was applied using a range of planners from the literature, as well as XTREE. In 10 open-source JAVA systems, several hundreds of defects were reduced in sections of the code that conformed to XTREE’s plans. Further, when compared to other planners, XTREE’s plans were found to be easier to implement (since they were shorter) and more effective at reducing the expected number of defects. Keywords Data mining · Actionable analytics · Planning · Bellwethers · Defect prediction

1 Introduction Data mining tools have been succesfully applied to many applications in software engineering; e.g. Czerwonka et al. (2011), Ostrand et al. (2004), Menzies et al. (2007a), Turhan et al. (2011), Kocaguneli et al. (2012), Begel and Zimmermann (2014), Theisen et al. 2015). Despite these successes, current software analytic tools have certain drawbacks. At Communicated by: Sarah Nadi  Rahul Krishna

[email protected] Tim Menzies [email protected] 1

Computer Science, Columbia University, New York, NY, USA

2

Computer Science, NC State University, Raleigh, NC, USA

Empirical Software Engineering

a workshop on “Actionable Analytics” at the 2015 IEEE conference on Automated Software Engineering, business users were vocal in their complaints about analytics (Hihn and Menzies 2015). “Those tools tell us what is, ” said one business user, “But they don’t tell us what to do”. Hence we seek new tools that offer guidance on “what to do” within a specific project. We seek such new tools since current analytics tools are mostly prediction algorithms such as support vector machines (Cortes and Vapnik 1995), naive Bayes classifiers (Lessmann et al. 2008), logistic regression (Lessmann et al. 2008). For example, defect prediction tools report what combinations of software project features predict for some dependent variable (such as the number of defects). Note that this is a different task to planning, which answers the question: what to change in order to improve quality. More specifically, we seek plans that propose least changes while most improvin