Code smell detection using multi-label classification approach

PDF / 2,467,185 Bytes
24 Pages / 439.642 x 666.49 pts Page_size
54 Downloads / 195 Views

Code smell detection using multi-label classiﬁcation approach Thirupathi Guggulothu1

· Salman Abdul Moiz1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Code smells are characteristics of the software that indicates a code or design problem which can make software hard to understand, evolve, and maintain. There are several code smell detection tools proposed in the literature, but they produce different results. This is because smells are informally defined or subjective in nature. Machine learning techniques help in addressing the issues of subjectivity, which can learn and distinguish the characteristics of smelly and non-smelly source code elements (classes or methods). However, the existing machine learning techniques can only detect a single type of smell in the code element that does not correspond to a real-world scenario as a single element can have multiple design problems (smells). Further, the mechanisms proposed in the literature could not detect code smells by considering the correlation (co-occurrence) among them. To address these shortcomings, we propose and investigate the use of multi-label classification (MLC) methods to detect whether the given code element is affected by multiple smells or not. In this proposal, two code smell datasets available in the literature are converted into a multi-label dataset (MLD). In the MLD, we found that there is a positive correlation between the two smells (long method and feature envy). In the classification phase, the two methods of MLC considered the correlation among the smells and enhanced the performance (on average more than 95% accuracy) for the 10-fold cross-validation with the ten iterations. The findings reported help the researchers and developers in prioritizing the critical code elements for refactoring based on the number of code smells detected. Keywords Code smells · Software quality · Code smell correlation · Multi-label classification · Code smells detection · Machine learning techniques · Refactoring

This article belongs to the Topical Collection on Quality Management for Information Systems Guest Editors: Mario Piattini, Ignacio Garc´ıa Rodr´ıguez de Guzm´an, Ricardo P´erez del Castillo Thirupathi Guggulothu

[email protected] Salman Abdul Moiz [email protected] 1

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana, India

Software Quality Journal

1 Introduction Code smell refers to an anomaly in the source code that shows the violation of basic design principles such as abstraction, hierarchy, encapsulation, modularity, and modifiability (Booch 1980). Even if the design principles are known to the developers, they are often violated because of inexperience, deadline pressure, and heavy competition in the market. Fowler et al. (1999) have defined 22 informal code smells. These smells have different granularities based on their affected element such as class-level (God class, data class, etc.) and method-level (long method and feature envy, etc.) code

Data Loading...

Code smell detection using multi-label classification approach

Recommend Documents

Multilabel Classification

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

Multilabel graph-based classification for missing labels

Code Smell Detection Based on Multi-dimensional Software Data and Complex Networks

Weed Detection Approach Using Feature Extraction and KNN Classification

Cognitive-Code Approach

Plant Disease Detection Using Image Classification

Analysis by Multiclass Multilabel Classification of the 2015 #SmearForSmear Campaign Using Deep Learning

Robust Approach for Emotion Classification Using Gait

Correction to: Weed Detection Approach Using Feature Extraction and KNN Classification

Taste and Smell

A Study of Malicious Code Classification System Using MinHash in Network Quarantine Using SDN