Code smell detection using multi-label classification approach

  • PDF / 2,467,185 Bytes
  • 24 Pages / 439.642 x 666.49 pts Page_size
  • 54 Downloads / 175 Views

DOWNLOAD

REPORT


Code smell detection using multi-label classification approach Thirupathi Guggulothu1

· Salman Abdul Moiz1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Code smells are characteristics of the software that indicates a code or design problem which can make software hard to understand, evolve, and maintain. There are several code smell detection tools proposed in the literature, but they produce different results. This is because smells are informally defined or subjective in nature. Machine learning techniques help in addressing the issues of subjectivity, which can learn and distinguish the characteristics of smelly and non-smelly source code elements (classes or methods). However, the existing machine learning techniques can only detect a single type of smell in the code element that does not correspond to a real-world scenario as a single element can have multiple design problems (smells). Further, the mechanisms proposed in the literature could not detect code smells by considering the correlation (co-occurrence) among them. To address these shortcomings, we propose and investigate the use of multi-label classification (MLC) methods to detect whether the given code element is affected by multiple smells or not. In this proposal, two code smell datasets available in the literature are converted into a multi-label dataset (MLD). In the MLD, we found that there is a positive correlation between the two smells (long method and feature envy). In the classification phase, the two methods of MLC considered the correlation among the smells and enhanced the performance (on average more than 95% accuracy) for the 10-fold cross-validation with the ten iterations. The findings reported help the researchers and developers in prioritizing the critical code elements for refactoring based on the number of code smells detected. Keywords Code smells · Software quality · Code smell correlation · Multi-label classification · Code smells detection · Machine learning techniques · Refactoring

This article belongs to the Topical Collection on Quality Management for Information Systems Guest Editors: Mario Piattini, Ignacio Garc´ıa Rodr´ıguez de Guzm´an, Ricardo P´erez del Castillo  Thirupathi Guggulothu

[email protected] Salman Abdul Moiz [email protected] 1

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana, India

Software Quality Journal

1 Introduction Code smell refers to an anomaly in the source code that shows the violation of basic design principles such as abstraction, hierarchy, encapsulation, modularity, and modifiability (Booch 1980). Even if the design principles are known to the developers, they are often violated because of inexperience, deadline pressure, and heavy competition in the market. Fowler et al. (1999) have defined 22 informal code smells. These smells have different granularities based on their affected element such as class-level (God class, data class, etc.) and method-level (long method and feature envy, etc.) code