A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many different bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers

PDF / 782,322 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
101 Downloads / 313 Views

DOWNLOAD

REPORT

bstract. Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many diﬀerent bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers trying to predict the faulty source code elements or code segments in the system always use experience from the past. In most of the cases these studies construct a database for their own purposes and do not make the gathered data publicly available. Public datasets are rare; however, a well constructed dataset could serve as a benchmark test input. Furthermore, open-source software development is rapidly increasing that also gives an opportunity to work with public data. In this study we selected 15 Java projects from GitHub to construct a public bug database from. We matched the already known and ﬁxed bugs with the corresponding source code elements (classes and ﬁles) and calculated a wide set of product metrics on these elements. After creating the desired bug database, we investigated whether the built database is usable for bug prediction. We used 13 machine learning algorithms to address this research question and ﬁnally we achieved F-measure values between 0.7 and 0.8. Beside the F-measure values we calculated the bug coverage ratio on every project for every machine learning algorithm. We obtained very high and promising bug coverage values (up to 100 %). Keywords: Bug prediction

1

· Bug database

Introduction

Software systems are likely to fail occasionally that is obviously unwanted both for the end users and for the software developers. Keeping the software quality at high-level is more important than ever, since customers deﬁne the reputation of the used subject system. Open-source software development paved its way, and has become a corner stone in the domain of evaluating research ideas and techniques dealing with computer science [19]. These publicly available systems gather a huge amount of historical data stored for example in version control systems or bug tracking systems. Researches have been using the opportunity given by these public information sets for a long time to prove the power of their approaches [1,14,24]. In spite of this fact, only a few publicly available c Springer International Publishing Switzerland 2016 O. Gervasi et al. (Eds.): ICCSA 2016, Part IV, LNCS 9789, pp. 625–638, 2016. DOI: 10.1007/978-3-319-42089-9 44

626

Z. T´ oth et al.

bug databases are presented to take role as a basis for further investigations. Many authors do not make the corpus used in their studies public, thus the experiments are not repeatable [12]. Our study tries to endorse the use of public databases for addressing diﬀerent research questions such as bug prediction related ones by showing the power of our automatically generated bug database in bug prediction domain. We have developed a toolchain that automatically gathers diﬀerent information about publicly available projects to build a bug database. We selected 15 Java projects from di

Data Loading...

A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

Recommend Documents

Correction to: On the feasibility of automated prediction of bug and non-bug issues

On the feasibility of automated prediction of bug and non-bug issues

Chinch Bug

Harvest Bug

Harlequin Bug

Buffalograss Chinch Bug

Sentiment Polarity and Bug Introduction

Hairy Chinch Bug

Mining Bug Data

On the relationship between bug reports and queries for text retrieval-based bug localization

Dubas Bug (Old World Date Bug), Ommatissus Lybicus Bergerin (Tropiduchidae: Hemiptera)

Squash Bug, Anasa Tristis (Degeer) (Hemiptera: Coreidae)