Ensemble Malware Classification Using Neural Networks

This work presents an experimental study of malware classification using the Microsoft Malware Classification Challenge 2015 dataset. We combine the approach of the winning solution to the Microsoft Malware Classification Challenge with the neural network

PDF / 2,089,842 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
82 Downloads / 274 Views

DOWNLOAD

REPORT

Abstract. This work presents an experimental study of malware classiﬁcation using the Microsoft Malware Classiﬁcation Challenge 2015 dataset. We combine the approach of the winning solution to the Microsoft Malware Classiﬁcation Challenge with the neural network approach. Using a combination of n-grams features for both assembly (asm) and byte code enables us to signiﬁcantly improve the result. By mixing multiple approaches, we are able to get the best log-loss result of 0.0025, so far. This comes mostly from the classical XGBoost method with n-gram contributions from the binary and assembly code. However, understanding this result is still incomplete. The standard neural network approaches (even with LSTM) alone give poorer results compared to the XGBoost, based on mostly n-gram. It is not clear why adding 6-grams to the binary code analysis does not improve results. There are many more options to be tested in the future, in particular networks. Keywords: Malware detection · Microsoft Malware Classiﬁcation Challenge · Malware neural networks

1

Introduction

Machine learning has a clear advantage over signature methods still used in malware detection. Constantly changing malware signatures and the use of obfuscation methods require eﬀective and fast detection and classiﬁcation methods. 1.1

Machine Learning-Based Malware Detection

Diﬀerent studies have demonstrated the proﬁciency of machine learning for the detection and classiﬁcation of malware ﬁles. Further, the accuracy of these machine learning models can be improved by using feature selection algorithms to select the most essential features and by reducing the size of the dataset, which leads to decreased computational overhead. In general, there are two major approaches to malware classiﬁcation. The ﬁrst is the classical method based on Supported by PUT statutory funds. One of the authors (CJ) acknowledges the NVIDIA GPU Grant of Quadro P6000 card. c Springer Nature Switzerland AG 2020 A. Dziech et al. (Eds.): MCSS 2020, CCIS 1284, pp. 125–138, 2020. https://doi.org/10.1007/978-3-030-59000-0_10

126

P. Wyrwinski et al.

hand-crafted feature selection. The other is a neural network approach. The customary thinking is that the neural approach, where the progress in recent years has been tremendous, gives better results for very large systems independent of a domain. For example, for Question Answering on SQuAD2.01 , the F-measure increased from 70.3% in 2017 to 93.011% in 2020. One would expect that using attention neural networks [16], or BERT [5] CNN+LSTM based networks, would give better results. The objective of this work is to test many neural network approaches and the use of an ensemble method to verify whether richer neural architectures would lead to improvement. Also, we would like to establish the relative importance of binary vs assembly language (asm) data. Initially, our work followed the convolutional neural network (CNN) approach to bytecode, originated in the Gilbert’s thesis [6] and the black-box approach of [11]. We make comparisons to

Data Loading...

Ensemble Malware Classification Using Neural Networks

Recommend Documents

An Evaluation of Convolutional Neural Networks for Malware Family Classification

FAB classification of acute leukemia using an ensemble of neural networks

Federated Ensemble Regression Using Classification

Malware Classification by Using Deep Learning Framework

Abusive Comments Classification in Social Media Using Neural Networks

Entity-Based Short Text Classification Using Convolutional Neural Networks

Smartphone-based bulky waste classification using convolutional neural networks

Ensemble Classification

Hierarchical classification of fine-art paintings using deep neural networks

Ensemble convolutional neural networks with weighted majority for wafer bin map pattern classification

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

Smartphone-Based Diabetic Retinopathy Severity Classification Using Convolution Neural Networks