Applicability of machine learning in spam and phishing email filtering: review and approaches

PDF / 5,838,978 Bytes
63 Pages / 439.37 x 666.142 pts Page_size
23 Downloads / 334 Views

Applicability of machine learning in spam and phishing email filtering: review and approaches Tushaar Gangavarapu1,2 · C. D. Jaidhar1 · Bhabesh Chanduka1

© Springer Nature B.V. 2020

Abstract With the influx of technological advancements and the increased simplicity in communication, especially through emails, the upsurge in the volume of unsolicited bulk emails (UBEs) has become a severe threat to global security and economy. Spam emails not only waste users’ time, but also consume a lot of network bandwidth, and may also include malware as executable files. Alternatively, phishing emails falsely claim users’ personal information to facilitate identity theft and are comparatively more dangerous. Thus, there is an intrinsic need for the development of more robust and dependable UBE filters that facilitate automatic detection of such emails. There are several countermeasures to spam and phishing, including blacklisting and content-based filtering. However, in addition to content-based features, behavior-based features are well-suited in the detection of UBEs. Machine learning models are being extensively used by leading internet service providers like Yahoo, Gmail, and Outlook, to filter and classify UBEs successfully. There are far too many options to consider, owing to the need to facilitate UBE detection and the recent advances in this domain. In this paper, we aim at elucidating on the way of extracting email content and behavior-based features, what features are appropriate in the detection of UBEs, and the selection of the most discriminating feature set. Furthermore, to accurately handle the menace of UBEs, we facilitate an exhaustive comparative study using several state-of-the-art machine learning algorithms. Our proposed models resulted in an overall accuracy of 99% in the classification of UBEs. The text is accompanied by snippets of Python code, to enable the reader to implement the approaches elucidated in this paper. Keywords Feature engineering · Machine learning · Phishing · Python · Spam

* Tushaar Gangavarapu [email protected] 1

Department of Information Technology, National Institute of Technology Karnataka, Surathkal, Mangaluru 575025, India

2

Automated Quality Assistance (AQuA) Machine Learning Research, Content Experience and Quality Algorithms, Amazon.com, Inc., Chennai, India

13

Vol.:(0123456789)

T. Gangavarapu et al.

1 Introduction Digital products and services increasingly mediate human activities. With the advent of email communication, unsolicited emails, in recent years, have become a serious threat to global security and economy (Bergholz et al. 2010). As a result of the ease of communication via emails, a vast number of issues involving the exploitation of technology to elicit personal and sensitive information have emerged. Identity theft, being one of the most profitable crimes, is often employed by felons to lure unsuspecting online users into revealing confidential information such as social security numbers, account numbers, and passwords. Unsolicited emails disg

Data Loading...

Applicability of machine learning in spam and phishing email filtering: review and approaches

Recommend Documents

SMS Spam Filtering Using Machine Learning Technique

E-Mail Spam Filtering: A Review of Techniques and Trends

A Review of Phishing URL Detection Using Machine Learning Classifiers

Towards Discovering Covert Communication Through Email Spam

Phishing URL Detection Using Machine Learning Techniques

Phishing Website Detection Using Machine Learning

Machine Learning Techniques for the Investigation of Phishing Websites

Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection

Artificial Immune System for Collaborative Spam Filtering

Review on Spectrum Sharing Approaches Based on Fuzzy and Machine Learning Techniques in Cognitive Radio Networks

The Innovative Biomarkers and Machine Learning Approaches in Gestational Diabetes Mellitus (GDM): A Short Review

An Intelligent Phishing Detection Scheme Using Machine Learning