Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network

PDF / 1,223,063 Bytes
15 Pages / 595.276 x 790.866 pts Page_size
43 Downloads / 380 Views

ORIGINAL ARTICLE

Automatic extraction of named entities of cyber threats using a deep Bi‑LSTM‑CRF network Gyeongmin Kim1 · Chanhee Lee1 · Jaechoon Jo2 · Heuiseok Lim1 Received: 15 June 2019 / Accepted: 6 April 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Countless cyber threat intelligence (CTI) reports are used by companies around the world on a daily basis for security reasons. To secure critical cybersecurity information, analysts and individuals should accordingly analyze information on threats and vulnerabilities. However, analyzing such overwhelming volumes of reports requires considerable time and effort. In this study, we propose a novel approach that automatically extracts core information from CTI reports using a named entity recognition (NER) system. During the process of constructing our proposed NER system, we defined meaningful keywords in the security domain as entities, including malware, domain/URL, IP address, Hash, and Common Vulnerabilities and Exposures. Furthermore, we linked these keywords with the words extracted from the text data of the report. To achieve a higher performance, we utilized the character-level feature vector as an input to bidirectional long-short-term memory using a conditional random field network. We finally achieved an average F1-score of 75.05%. We release 498,000 tag datasets created during our research. Keywords Cybersecurity · Vulnerability · Cyber threat intelligence · Named entity recognition · Bidirectional long-shortterm memory conditional random field

1 Introduction Cyber properties such as IP addresses, URLs, and private data are continuously under threat of malware, viruses, and malicious actors. The use of unsecured data or websites makes users vulnerable to hackers. Users are rarely capable of detecting such attacks and have a lack of information regarding attack patterns and methods. Recent cyber threats are not only aimed at individual users but also businesses regardless of their scale [13]. For this reason, people should * Heuiseok Lim [email protected] Gyeongmin Kim [email protected] Chanhee Lee [email protected] Jaechoon Jo [email protected] 1

Korea University, Anam‑dong, Seongbuk‑gu, Seoul 02841, Republic of Korea

Hanshin University, 137, Hanshindae‑gil, Osan‑si 18101, Republic of Korea

2

always be aware of cyber threats and vulnerabilities. Cyber threat intelligence (CTI) reports provide useful data, information, and insight into cybersecurity, including important keywords such as malware names, attack schemes, and the IP address of attackers and other victims. Extracting such significant entities through a structural methodology from CTI reports is valuable for professional practitioners and a necessary step in cybersecurity research. In studies applying various text mining and machine learning methods, researchers have recently attempted to extract key entities optimized within the cybersecurity domain [30, 34]. Traditional statistical-based extraction methods that rely on feature engineeri

Data Loading...

Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network

Recommend Documents

Space and Cyber Threats

Recognizing Named Entities in Specific Domain

Automatic Leaf Species Recognition Using Deep Neural Network

Insider Threats in Cyber Security

A framework for crime data analysis using relationship among named entities

Automatic Extraction of Locations from News Articles Using Domain Knowledge

Automatic analog meter reading for plant inspection using a deep neural network

Automatic Segmentation of Achilles Tendon Tissues Using Deep Convolutional Neural Network

Countering Cyber Threats to Financial Institutions A Private and Pub

Managing Cyber Threats Issues, Approaches, and Challenges

Automatic method for classification of groundnut diseases using deep convolutional neural network

A Deep Insight into Signature Verification Using Deep Neural Network