On the relationship between bug reports and queries for text retrieval-based bug localization

  • PDF / 766,918 Bytes
  • 42 Pages / 439.642 x 666.49 pts Page_size
  • 40 Downloads / 170 Views

DOWNLOAD

REPORT


On the relationship between bug reports and queries for text retrieval-based bug localization Chris Mills1 · Esteban Parra1 Sonia Haiduc1

· Jevgenija Pantiuchina2 · Gabriele Bavota2 ·

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract As societal dependence on software continues to grow, bugs are becoming increasingly costly in terms of financial resources as well as human safety. Bug localization is the process by which a developer identifies buggy code that needs to be fixed to make a system safer and more reliable. Unfortunately, manually attempting to locate bugs solely from the information in a bug report requires advanced knowledge of how a system is constructed and the way its constituent pieces interact. Therefore, previous work has investigated numerous techniques for reducing the human effort spent in bug localization. One of the most common approaches is Text Retrieval (TR) in which a system’s source code is indexed into a search space that is then queried for code relevant to a given bug report. In the last decade, dozens of papers have proposed improvements to bug localization using TR with largely positive results. However, several other studies have called the technique into question. According to these studies, evaluations of TR-based approaches often lack sufficient controls on biases

Communicated by: David Lo and Foutse Khomh This article belongs to the Topical Collection: Software Maintenance and Evolution (ICSME)  Chris Mills

[email protected] Esteban Parra [email protected] Jevgenija Pantiuchina [email protected] Gabriele Bavota [email protected] Sonia Haiduc [email protected] 1

Florida State University, 600 W College Ave, . Tallahassee, FL, 32306, USA

2

Universit`a della Svizzera italiana, Via Giuseppe Buffi 13, 6900 Lugano, Switzerland

Empirical Software Engineering

that artificially inflate the results, namely: misclassified bugs, tangled commits, and localization hints. Here we argue that contemporary evaluations of TR approaches also include a negative bias that outweighs the previously identified positive biases: while TR approaches expect a natural language query, most evaluations simply formulate this query as the full text of a bug report. In this study we show that highly performing queries can be extracted from the bug report text, in order to make TR effective even without the aforementioned positive biases. Further, we analyze the provenance of terms in these highly performing queries to drive future work in automatic query extraction from bug reports. Keywords Bug localization · Query formulation · Text retrieval

1 Introduction Bug localization techniques are used to identify source code components that are likely to be responsible for a given bug. These techniques represent an important aid to reduce the time and effort spent on bug fixing activities. For this reason, many researchers defined various bug localization approaches, with Text Retrieval (TR) techniques playing a major role in this context (Dit et al. 2013). Th