Data Type Classification: Hierarchical Class-to-Type Modeling

Data and file type classification research conducted over the past ten to fifteen years has been dominated by competing experiments that only vary the number of classes, types of classes, machine learning technique and input vector. There has been surpris

PDF / 1,692,014 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
29 Downloads / 309 Views

DOWNLOAD

REPORT

Data and ﬁle type classiﬁcation research conducted over the past ten to ﬁfteen years has been dominated by competing experiments that only vary the number of classes, types of classes, machine learning technique and input vector. There has been surprisingly little innovation on fundamental approaches to data and ﬁle type classiﬁcation. This chapter focuses on the empirical testing of a hypothesized, two-level hierarchical classiﬁcation model and the empirical derivation and testing of several alternative classiﬁcation models. Comparative evaluations are conducted on ten classiﬁcation models to identify a ﬁnal winning, two-level classiﬁcation model consisting of ﬁve classes and 52 lower-level data and ﬁle types. Experimental results demonstrate that the approach leads to very good class-level classiﬁcation performance, improved classiﬁcation performance for data and ﬁle types without high entropy (e.g., compressed and encrypted data) and reasonably-equivalent classiﬁcation performance for high-entropy data and ﬁle types.

Keywords: Statistical classiﬁcation, data types, ﬁle types, hierarchical model

1.

Introduction

Statistical data type classiﬁcation has many important applications in cyber security and digital forensics. Cyber security applications include intrusion detection, content-based ﬁrewall blocking, malware detection and analysis, and steganalysis. Data type classiﬁcation can defend against many common signature obfuscation techniques and enhance the detection and blocking of undesired network traﬃc. It can also help map binary objects [12], which is useful in malware analysis and, possibly, steganalysis. In digital forensics, data type classiﬁcation aids c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing AG 2016. All Rights Reserved G. Peterson and S. Shenoi (Eds.): Advances in Digital Forensics XII, IFIP AICT 484, pp. 325–343, 2016. DOI: 10.1007/978-3-319-46279-0 17

326

ADVANCES IN DIGITAL FORENSICS XII

fragment identiﬁcation, isolation, recovery and ﬁle reassembly. Commercial and open-source tools such as file and TrID are reliant on ﬁle signatures and other magic numbers, rendering them ineﬀective when ﬁle headers and/or other blocks containing key magic numbers are missing or corrupted, or when their locations in the ﬁles are unknown [16]. Data type classiﬁcation can also aid forensic triage eﬀorts and improve investigative eﬃciency by targeting or prioritizing investigative eﬀorts and search results [8]. This research focuses on data type classiﬁcation absent reliable ﬁle signatures, ﬁlename extensions and other ﬁlesystem data that may identify the data type, either based on the ﬁle type that the data fragment used to be a part of in the case of ﬁles and composite objects, or based on the data type or primitive data type as deﬁned by Erbacher and Mulholland [13]. It is important to note that, when reliable ﬁle signatures, ﬁlename extensions or ﬁlesystem data exist pertaining to a data fragment, traditional ﬁle signature based method

Data Loading...

Data Type Classification: Hierarchical Class-to-Type Modeling

Recommend Documents

Hierarchical Modeling and Analysis for Spatial Data

Generic Relative Relations in Hierarchical Gene Expression Data Classification

Hierarchical Linear Modeling (HLM)

Hierarchical Linear Modeling

Automatic Hierarchical Color Image Classification

Visualizing Hierarchical Data

Hierarchical Data Structures

Classification of Uncontrolled Intersections Using Hierarchical Clustering

Geographical Information Enhanced POI Hierarchical Classification

Active learning for hierarchical multi-label classification

Hyperbolic Embeddings for Hierarchical Multi-label Classification

Hierarchical Modeling with Neurodynamical Agglomerative Analysis