Learning Human-Written Commit Messages to Document Code Changes

  • PDF / 1,299,510 Bytes
  • 20 Pages / 595 x 842 pts (A4) Page_size
  • 49 Downloads / 204 Views

DOWNLOAD

REPORT


Learning Human-Written Commit Messages to Document Code Changes Yuan Huang1 , Nan Jia2 , Hao-Jie Zhou1 , Xiang-Ping Chen3,∗ Member, IEEE Zi-Bin Zheng1 , Senior Member, IEEE, and Ming-Dong Tang4,5 , Member, ACM, IEEE 1

National Engineering Research Center of Digital Life, School of Data and Computer Science, Sun Yat-sen University Guangzhou 510006, China

2

School of Information Engineering, Hebei GEO University, Shijiazhuang 050031, China

3

Guangdong Key Laboratory for Big Data Analysis and Simulation of Public Opinion, School of Communication and Design, Sun Yat-sen University, Guangzhou 510006, China

4

School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou 510006, China

5

Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou 510006, China

E-mail: [email protected]; jianan [email protected]; [email protected] E-mail: {chenxp8, zhzibin}@mail.sysu.edu.cn; [email protected] Received April 5, 2020; revised October 15, 2020. Abstract Commit messages are important complementary information used in understanding code changes. To address message scarcity, some work is proposed for automatically generating commit messages. However, most of these approaches focus on generating summary of the changed software entities at the superficial level, without considering the intent behind the code changes (e.g., the existing approaches cannot generate such message: “fixing null pointer exception”). Considering developers often describe the intent behind the code change when writing the messages, we propose ChangeDoc, an approach to reuse existing messages in version control systems for automatical commit message generation. Our approach includes syntax, semantic, pre-syntax, and pre-semantic similarities. For a given commit without messages, it is able to discover its most similar past commit from a large commit repository, and recommend its message as the message of the given commit. Our repository contains half a million commits that were collected from SourceForge. We evaluate our approach on the commits from 10 projects. The results show that 21.5% of the recommended messages by ChangeDoc can be directly used without modification, and 62.8% require minor modifications. In order to evaluate the quality of the commit messages recommended by ChangeDoc, we performed two empirical studies involving a total of 40 participants (10 professional developers and 30 students). The results indicate that the recommended messages are very good approximations of the ones written by developers and often include important intent information that is not included in the messages generated by other tools. Keywords

1

commit message recommendation, code syntax similarity, code semantic similarity, code change comprehension

Introduction

In software maintenance, understanding code changes costs developers most of their time [1, 2] . These

code changes are often organized and saved as commits in version control systems (e.g., Git). Normally, a message in natural language is wri