Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

  • PDF / 1,380,336 Bytes
  • 18 Pages / 595 x 842 pts (A4) Page_size
  • 33 Downloads / 200 Views

DOWNLOAD

REPORT


Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics Mohammad Y. Mhawish and Manjari Gupta Computer Science, Centre for Interdisciplinary Mathematical Sciences, Institute of Science, Banaras Hindu University Varanasi 221005, India

E-mail: [email protected]; [email protected] Received January 24, 2020; revised September 29, 2020. Abstract Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model’s predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software’s quality. Keywords

1

code smell, code smell detection, feature selection, prediction explanation, parameter optimization

Introduction

In software development, there are functional and non-functional quality requirements that the developers have to follow to ensure software quality [1] . Developers focus on pure functional requirements and neglect the non-functional requirements such as maintainability, evolution, testability, understandability, and reusability [2] . The lack of non-functional quality requirements causes poor software quality, which is leading to increased complexity and efforts for maintenance and evolution due to the weakness of the software design. Code smells refer to the term used to describe the lousy implementation structures of the software introduced by Fowler et al. [3] They presented informal definitions of 22 code smells. Several studies examined the impact of code smells on software [4–8] , and they showed their adverse effects on the quality of the software. They also presented

an analysis of the effect of code smells in increasing the risk of faults and failures of the software system. They found the challenge that code smells had an adverse effect on the software evolution process and recommended refactoring the software to remove them. Olbrich et al. [9, 10] , Khomh et al. [11] , and Deligiannis et al. [12] studied the impact of code smells on software evolution by analyzing the frequency and size of change