Data Type Classification: Hierarchical Class-to-Type Modeling
Data and file type classification research conducted over the past ten to fifteen years has been dominated by competing experiments that only vary the number of classes, types of classes, machine learning technique and input vector. There has been surpris
- PDF / 1,692,014 Bytes
- 19 Pages / 439.37 x 666.142 pts Page_size
- 29 Downloads / 202 Views
Data and file type classification research conducted over the past ten to fifteen years has been dominated by competing experiments that only vary the number of classes, types of classes, machine learning technique and input vector. There has been surprisingly little innovation on fundamental approaches to data and file type classification. This chapter focuses on the empirical testing of a hypothesized, two-level hierarchical classification model and the empirical derivation and testing of several alternative classification models. Comparative evaluations are conducted on ten classification models to identify a final winning, two-level classification model consisting of five classes and 52 lower-level data and file types. Experimental results demonstrate that the approach leads to very good class-level classification performance, improved classification performance for data and file types without high entropy (e.g., compressed and encrypted data) and reasonably-equivalent classification performance for high-entropy data and file types.
Keywords: Statistical classification, data types, file types, hierarchical model
1.
Introduction
Statistical data type classification has many important applications in cyber security and digital forensics. Cyber security applications include intrusion detection, content-based firewall blocking, malware detection and analysis, and steganalysis. Data type classification can defend against many common signature obfuscation techniques and enhance the detection and blocking of undesired network traffic. It can also help map binary objects [12], which is useful in malware analysis and, possibly, steganalysis. In digital forensics, data type classification aids c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing AG 2016. All Rights Reserved G. Peterson and S. Shenoi (Eds.): Advances in Digital Forensics XII, IFIP AICT 484, pp. 325–343, 2016. DOI: 10.1007/978-3-319-46279-0 17
326
ADVANCES IN DIGITAL FORENSICS XII
fragment identification, isolation, recovery and file reassembly. Commercial and open-source tools such as file and TrID are reliant on file signatures and other magic numbers, rendering them ineffective when file headers and/or other blocks containing key magic numbers are missing or corrupted, or when their locations in the files are unknown [16]. Data type classification can also aid forensic triage efforts and improve investigative efficiency by targeting or prioritizing investigative efforts and search results [8]. This research focuses on data type classification absent reliable file signatures, filename extensions and other filesystem data that may identify the data type, either based on the file type that the data fragment used to be a part of in the case of files and composite objects, or based on the data type or primitive data type as defined by Erbacher and Mulholland [13]. It is important to note that, when reliable file signatures, filename extensions or filesystem data exist pertaining to a data fragment, traditional file signature based method
Data Loading...