Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

PDF / 3,232,185 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
33 Downloads / 233 Views

ORIGINAL RESEARCH

Comparison of Image‑Based and Text‑Based Source Code Classification Using Deep Learning Elife Ozturk Kiyak1 · Ayse Betul Cengiz1 · Kokten Ulas Birant2 · Derya Birant2 Received: 31 March 2020 / Accepted: 30 July 2020 © Springer Nature Singapore Pte Ltd 2020

Abstract Source code classification (SCC) is a task to assign codes into different categories according to a criterion such as according to their functionalities, programming languages or vulnerabilities. Many source code archives are organized according to the programming languages, and thereby, the desired code fragments can be easily accessed by searching within the archive. However, manually organizing source code archives by field experts is labor intensive and impractical because of the fastgrowing available source codes. Therefore, this study proposes new convolutional neural network (CNN) architectures to build source code classifiers that automatically identify programming languages from source codes. This is the first study in which the performances of deep learning algorithms on programming language identification are compared on both image and text files. In this study, the experiments are performed on three source code datasets to identify eight programming languages, including C, C++, C# , Go, Python, Ruby, Rust, and Java. The comparative results indicate that although textbased SCC and image-based SCC approaches achieve very high ( > 93.5% ) and similar accuracies, text-based classification has significantly better performance in terms of execution time. Keywords Source code classification · Software engineering · Programming languages · Deep learning · Image classification · Text mining

Introduction Until now, various programming languages have been developed such as C, C++, C#, Java, and Python, and used for many software engineering projects. The source codes This article is part of the topical collection “Deep learning approaches for data analysis: A practical perspective” guest edited by D. Jude Hemanth, Lipo Wang and Anastasia Angelopoulou. * Derya Birant [email protected] Elife Ozturk Kiyak [email protected] Ayse Betul Cengiz [email protected] Kokten Ulas Birant [email protected] 1

The Graduate School of Natural and Applied Sciences, Dokuz Eylul University, 35390 Izmir, Turkey

Department of Computer Engineering, Dokuz Eylul University, 35390 Izmir, Turkey

2

written in different languages have been continuously pushing into active repositories such as GitHub, SourceForge, and Bitbucket. With the increase of open-source programming environments in recent years, the number of users who benefit from these environments is also growing. They can add their own codes written in different programming languages, or easily access the ready-written codes and make some changes on them. There is a significant increase in the use of online coding platforms such as CodeForces, and Google Colab. Thus, a substantial volume of source codes has become accessible on many online platforms. In addition to th

Data Loading...

Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

Recommend Documents

Sentence Relation Classification Using Deep Learning Experiments

Malware Classification by Using Deep Learning Framework

Image surface texture analysis and classification using deep learning

Classification of ladies finger plant leaf using deep learning

KNN Applied to PDG for Source Code Similarity Classification

The heat source layout optimization using deep learning surrogate modeling

Image Classification Model Using Deep Learning on the Edge Device

Deep Learning in Malware Identification and Classification

Application of Deep Learning to Seizure Classification

Bengali Accent Classification from Speech Using Different Machine Learning and Deep Learning Techniques

Deep Neural Networks for Supervised Learning: Classification

Heavy Vehicle Classification Through Deep Learning