Mining Bug Data

Although software systems control many aspects of our daily life world, no system is perfect. Many of our day-to-day experiences with computer programs are related to software bugs. Although software bugs are very unpopular, empirical software engineers a

  • PDF / 1,175,570 Bytes
  • 41 Pages / 439.36 x 666.15 pts Page_size
  • 28 Downloads / 172 Views

DOWNLOAD

REPORT


Mining Bug Data A Practitioner’s Guide Kim Herzig and Andreas Zeller

Abstract Although software systems control many aspects of our daily life world, no system is perfect. Many of our day-to-day experiences with computer programs are related to software bugs. Although software bugs are very unpopular, empirical software engineers and software repository analysts rely on bugs or at least on those bugs that get reported to issue management systems. So what makes data software repository analysts appreciate bug reports? Bug reports are development artifacts that relate to code quality and thus allow us to reason about code quality, and quality is key to reliability, end-users, success, and finally profit. This chapter serves as a hand-on tutorial on how to mine bug reports, relate them to source code, and use the knowledge of bug fix locations to model, estimate, or even predict source code quality. This chapter also discusses risks that should be addressed before one can achieve reliable recommendation systems.

6.1 Introduction A central human quality is that we can learn from our mistakes: While we may not be able to avoid new errors, we can at least learn from the past to make sure the same mistakes are not made again. This makes software bugs and their corresponding bug reports an important and frequently mined source for recommendation systems that make suggestions on how to improve the quality and reliability of a software project or process. To predict, rate, or classify the quality of code artifacts (e.g., source files or binaries) or code changes, it is necessary to learn which factors influence code quality. Bug databases—repositories filled with issue reports filed by end users and developers—are one of the most important sources for this data. These reports of open and fixed code quality issues make rare and valuable assets.

K. Herzig () • A. Zeller Saarland University, Saarbrücken, Germany e-mail: [email protected]; [email protected] M.P. Robillard et al. (eds.), Recommendation Systems in Software Engineering, DOI 10.1007/978-3-642-45135-5__6, © Springer-Verlag Berlin Heidelberg 2014

131

132

K. Herzig and A. Zeller

In this chapter we discuss the techniques, chances, and perils of mining bug reports that can be used to build a recommender system that suggests quality. Such systems can predict the quality of code elements. This information may help to prioritize resources such as testing and code reviews. In order to build such a recommendation system, one has to first understand the available content of issue repositories (Sects. 6.2) and its correctness (Sect. 6.3). The next important step is to link bug reports to changes, in order to get a quality indicator, for example, a count of bugs per code artifact. There are many aspects that can lead to incorrect counts, such as bias, noise, and errors in the data (Sect. 6.4). Once the data has been collected, a prediction model can be built using code metrics (Sect. 6.5). The chapter closes with a hands-on tutorial on how to mine bug da