Context-dependent model for spam detection on social networks

PDF / 897,087 Bytes
8 Pages / 595.276 x 790.866 pts Page_size
20 Downloads / 212 Views

Context‑dependent model for spam detection on social networks Razan Ghanem1 · Hasan Erbay2 Received: 2 November 2019 / Accepted: 19 August 2020 © Springer Nature Switzerland AG 2020

Abstract Social media platforms are getting an important communication medium in our daily life, and their increasing popularity makes them an ideal platform for spammers to spread spam messages, known as spam problems. Moreover, messages on social media are vague and messy, so a good representation of the text may be the first step to address spam problem. While traditional weighting methods suffer from both high dimensionality and high sparsity problems, traditional word embedding methods suffer from context independence and out of vocabulary problems. To overcome these problems, in this study, we propose a novel architecture based on a context-dependent representation of text using the BERT model. The model was tested using the Twitter dataset, and experimental results show that the proposed method outperforms traditional weighting methods, traditional word embedding based methods as well as the existing state of the art methods used to detect spam on the twitter platform. Keywords Spam detection · Word embedding · Bidirectional encoder representations from transformers

1 Introduction Social media are interactive computer-mediated technologies that facilitate the creation or sharing of information, ideas, career interests, and other forms of expression via virtual communities and networks. Twitter is one of the most popular social media nowadays. Twitter reported that its worldwide monetizable daily active users (mDAUs) grew by 24% to 166 million in Q1 2020. Each twitter user has, on average, 208 followers, and they post 140 million tweets daily. This popularity of the Twitter platform has made it a suitable environment for spreading spam messages, which have become a challenging problem due to the messy and ambiguity of short text messages on social media. Social spam messages might be defined as irrelevant or unsolicited messages sent over social media such as malicious links, advertisements, or any low-quality content. Unlike long messages like e-mails, social spam messages

are more sparse and ambiguous, and thus spam classification problem in social networks becomes a more challenging problem. One of the important tasks that could be utilized to handle short text on social media is word representation. The traditional word representation methods are based on the Bag of Word (BoW) model in which each word or n-gram is linked to a vector index and marked as 0 or 1 depending on whether it occurs in a given document. Although it produces acceptable results, it suffers from some problems like high dimensionality and high sparsity. Word Embedding methods solve these problems by representing the words as dense vectors, where a vector represents the projection of the word into a continuous vector space. Word2vec is the first-word embedding model introduced by Tomas Mikolov in 2013 at Google. There are two main training algorithms f

Data Loading...

Context-dependent model for spam detection on social networks

Recommend Documents

A Study of Spam Detection Algorithm on Social Media Networks

Spam Detection on Arabic Twitter

Web Spam Detection

A Social Spam Detection Framework via Semi-supervised Learning

Fighting spam using social GateKeepers

Image Spam Classification with Deep Neural Networks

Review Spam Detection Based on Multi-dimensional Features

Near Real-Time Detection of Misinformation on Online Social Networks

Gender Detection on Social Networks Using Ensemble Deep Learning

Detection of Misbehaviors in Clone Identities on Online Social Networks

Online Social Networks Event Detection: A Survey

Efficient Prevention Mechanism Against Spam Attacks for Social Networking Sites