A Comparative Analysis of Various Spam Classifications

Bandwidth, time, and storage space are the major three assets in computational world. Spam emails affect all the three, thus degrade the overall efficiency of the system. Spammers are using new tricks and traps to land these frivolous mails into our inbox

  • PDF / 174,105 Bytes
  • 7 Pages / 439.37 x 666.142 pts Page_size
  • 23 Downloads / 219 Views

DOWNLOAD

REPORT


Abstract Bandwidth, time, and storage space are the major three assets in computational world. Spam emails affect all the three, thus degrade the overall efficiency of the system. Spammers are using new tricks and traps to land these frivolous mails into our inbox. To make mailboxes more intelligent, our effort will be to devise a new algorithm that will help to classify emails in much smarter and efficient way. This paper analyzes various spam classification techniques and thereby put forward a new way of classifying spam emails. This paper thoroughly compares the results that various authors have got while simulating their architectures. Our approach of classification works efficiently and more accurately on varied length and type of datasets during training and testing phases. We tried to minimize the error ratio and increase classifier efficiency by implementing Genetic Algorithm concept.





Keywords Spam classification Spam email Unsolicited Logistic regression Genetic algorithm Machine learning







Feature set



1 Introduction Unsolicited bulk email or junk email are frivolous mails, which are sent in bulk to either make an advertisement [1], proliferate viruses, hack mailboxes [1], cheat somebody, or send a prank. As emails are sent to millions with no incurring cost, the spam traffic between MTA’s causes delayed delivery of true mails [2]. Spams nearly occupy about two-third of our mailboxes [1], thereby causing inefficient utilization of storage space, bandwidth, and time [1]. N.F. Shah (✉) ⋅ P. Kumar Department of Computer Science & Engineering, Birla Institute of Technology, Mesra, Ranchi 835215, India e-mail: [email protected] P. Kumar e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 P.K. Sa et al. (eds.), Progress in Intelligent Computing Techniques: Theory, Practice, and Applications, Advances in Intelligent Systems and Computing 519, DOI 10.1007/978-981-10-3376-6_29

265

266

N.F. Shah and P. Kumar

In order to keep spammers at bay, there are many spam filtering techniques which are robust enough to detect a spam mail. Some of them use knowledge Engg. (KE) based approach, while majority of them are following the machine learning (ML) approach [3]. The latter is more robust and intelligent way of classifying emails. The former uses the stored procedure or rules to classify emails. It may have stored dictionary of words like BUY, SPAM, Lottery, Offer, Prize, Reward, etc. It periodically updates its dictionary to adapt with new trends [4]. But this practice is not so efficient because once the dictionary or repository of words is set, it is impossible to constantly update it at different end-user sites. In comparison to KE, machine-learning approach (ML) is an intelligent way of filtering spams. ML do not have predefined rules or procedure. It can mutate itself to adapt with user needs, so ML is based on user adaptability. Our research will be based on the analytical approaches put forth by various researchers. We will thoroughly analyze their approaches and results, thereby devise