A Public Bug Database of GitHub Projects and Its Application in Bug Prediction
Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many different bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers
- PDF / 782,322 Bytes
- 14 Pages / 439.37 x 666.142 pts Page_size
- 101 Downloads / 225 Views
bstract. Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many different bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers trying to predict the faulty source code elements or code segments in the system always use experience from the past. In most of the cases these studies construct a database for their own purposes and do not make the gathered data publicly available. Public datasets are rare; however, a well constructed dataset could serve as a benchmark test input. Furthermore, open-source software development is rapidly increasing that also gives an opportunity to work with public data. In this study we selected 15 Java projects from GitHub to construct a public bug database from. We matched the already known and fixed bugs with the corresponding source code elements (classes and files) and calculated a wide set of product metrics on these elements. After creating the desired bug database, we investigated whether the built database is usable for bug prediction. We used 13 machine learning algorithms to address this research question and finally we achieved F-measure values between 0.7 and 0.8. Beside the F-measure values we calculated the bug coverage ratio on every project for every machine learning algorithm. We obtained very high and promising bug coverage values (up to 100 %). Keywords: Bug prediction
1
· Bug database
Introduction
Software systems are likely to fail occasionally that is obviously unwanted both for the end users and for the software developers. Keeping the software quality at high-level is more important than ever, since customers define the reputation of the used subject system. Open-source software development paved its way, and has become a corner stone in the domain of evaluating research ideas and techniques dealing with computer science [19]. These publicly available systems gather a huge amount of historical data stored for example in version control systems or bug tracking systems. Researches have been using the opportunity given by these public information sets for a long time to prove the power of their approaches [1,14,24]. In spite of this fact, only a few publicly available c Springer International Publishing Switzerland 2016 O. Gervasi et al. (Eds.): ICCSA 2016, Part IV, LNCS 9789, pp. 625–638, 2016. DOI: 10.1007/978-3-319-42089-9 44
626
Z. T´ oth et al.
bug databases are presented to take role as a basis for further investigations. Many authors do not make the corpus used in their studies public, thus the experiments are not repeatable [12]. Our study tries to endorse the use of public databases for addressing different research questions such as bug prediction related ones by showing the power of our automatically generated bug database in bug prediction domain. We have developed a toolchain that automatically gathers different information about publicly available projects to build a bug database. We selected 15 Java projects from di
Data Loading...