E-Mail Spam Filtering: A Review of Techniques and Trends
We present an inclusive review of recent and successful content-based e-mail spam filtering techniques. Our focus is mainly on machine learning-based spam filters and variants inspired from them. We report on relevant ideas, techniques, taxonomy, major ef
- PDF / 200,411 Bytes
- 8 Pages / 439.37 x 666.142 pts Page_size
- 14 Downloads / 228 Views
Abstract We present an inclusive review of recent and successful content-based e-mail spam filtering techniques. Our focus is mainly on machine learning-based spam filters and variants inspired from them. We report on relevant ideas, techniques, taxonomy, major efforts, and the state-of-the-art in the field. The initial interpretation of the prior work examines the basics of e-mail spam filtering and feature engineering. We conclude by studying techniques, evaluation benchmarks, and explore the promising offshoots of latest developments and suggest lines of future investigations. Keywords Spam Machine learning
Spam filtering Techniques False positive
1 Introduction E-mail or electronic-mail is a fast, effective, and inexpensive method of exchanging messages over the Internet. Whether it is a personal message from a family member, a company-wide message from the boss, researchers across continents sharing recent findings, or astronauts staying in touch with their family (via e-mail uplinks or IP phones), e-mail is a preferred means for communication. Used worldwide by 2.3 billion users, at the time of writing the article, e-mail usage is projected to increase up to 4.3 billion accounts by 2016 [1]. But the increasing dependence on e-mail has induced the emergence of many problems caused by ‘illegitimate’ e-mails, i.e., spam. According to the Text Retrieval Conference (TREC) the term A. Bhowmick (&) School of Technology, Assam Don Bosco University, Guwahati 781017, Assam, India e-mail: [email protected] S.M. Hazarika Deptartment of Computer Science and Engineering, Tezpur University, Tezpur 784028, Assam, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 A. Kalam et al. (eds.), Advances in Electronics, Communication and Computing, Lecture Notes in Electrical Engineering 443, https://doi.org/10.1007/978-981-10-4765-7_61
583
584
A. Bhowmick and S.M. Hazarika
‘spam’ is—any unsolicited e-mail that is sent indiscriminately [2]. Spam e-mails are unsolicited, un-ratified, and usually mass mailed. Spam being a carrier of malware causes the proliferation of unsolicited advertisements, fraud schemes, phishing messages, explicit content, promotions of cause, etc. On an organizational front, spam effects include: (i) annoyance to individual users, (ii) less reliable e-mails, (iii) loss of work productivity, (iv) misuse of network bandwidth, (v) wastage of file server storage space and computational power, (vi) spread of viruses, worms, and Trojan horses, and (vii) financial losses through phishing, denial of service (DoS), directory harvesting attacks, etc. Figure 1 depicts the e-mail architecture and how e-mail works. Spam is a broad concept that is still not completely understood. In general, spam has many forms— chat rooms are subject to chat spam, blogs are subject to blog spam (splogs), search engines are often misled by web spam (search engine spamming or spamdexing), while social systems are plagued by social spam. This paper focuses on ‘e-mail spam’ and its variants, and not ‘s
Data Loading...