CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge

  • PDF / 4,274,019 Bytes
  • 52 Pages / 439.642 x 666.49 pts Page_size
  • 74 Downloads / 144 Views

DOWNLOAD

REPORT


CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge Rodrigo Fernandes Gomes da Silva1 · Chanchal K. Roy2 · ´ ˜ 1· Mohammad Masudur Rahman2 · Kevin A. Schneider2 · Klerisson Paixao 1 1 Carlos Eduardo de Carvalho Dantas · Marcelo de Almeida Maia

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face three major problems. First, they frequently need to read and analyse multiple results from the search engines to obtain a satisfactory solution. Second, the search is impaired due to a lexical gap between the query (task description) and the information associated with the solution (e.g., code example). Third, the retrieved solution may not be comprehensible, i.e., the code segment might miss a succinct explanation. To address these three problems, we propose CROKAGE (CrowdKnowledge Answer Generator), a tool that takes the description of a programming task (the query) as input and delivers a comprehensible solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations written by human developers. The search for code examples is modeled as an Information Retrieval (IR) problem. We first leverage the crowd knowledge stored in Stack Overflow to retrieve the candidate answers against a programming task. For this, we use a fine-tuned IR technique, chosen after comparing 11 IR techniques in terms of performance. Then we use a multi-factor relevance mechanism to mitigate the lexical gap problem, and select the top quality answers related to the task. Finally, we perform natural language processing on the top quality answers and deliver the comprehensible solutions containing both code examples and code explanations unlike earlier studies. We evaluate and compare our approach against ten baselines, including the state-of-art. We show that CROKAGE outperforms the ten baselines in suggesting relevant solutions for 902 programming tasks (i.e., queries) of three popular programming languages: Java, Python and PHP. Furthermore, we use 24 programming tasks (queries) to evaluate our solutions with 29 developers and confirm that CROKAGE outperforms the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation).

Communicated by: Tim Menzies  Marcelo de Almeida Maia

[email protected]

Extended author information available on the last page of the article.

Empirical Software Engineering

Keywords Mining crowd knowledge · Stack overflow · Word embedding · Code search

1 Introduction Software developers often search for relevant code examples on the web to implement their programming tasks. Although there exist several Internet-scale code search engines (e.g., Koders, Krugle, GitHub), finding code examples on the web is still a major challenge (Rahman and Roy 2018). Developers often choose an ad hoc query to