A Social Spam Detection Framework via Semi-supervised Learning

With the increasing popularity of social networking websites such as Twitter, Facebook, Sina Weibo and MySpace, spammers on them are getting more and more rampant. Social spammers always create a mass of compromised or fake accounts to deceive users and l

  • PDF / 886,475 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 69 Downloads / 239 Views

DOWNLOAD

REPORT


Abstract. With the increasing popularity of social networking websites such as Twitter, Facebook, Sina Weibo and MySpace, spammers on them are getting more and more rampant. Social spammers always create a mass of compromised or fake accounts to deceive users and lead them to access malicious websites which contain illegal, pornography or dangerous information. As we all know, most of the studies on social spam detection are based on supervised machine learning which requires plenty of annotated datasets. Unfortunately, labeling a large number of datasets manually is a complex, error-prone and tedious task which may costs a lot of human efforts and time. In this paper, we propose a novel semi-supervised classification framework for social spam detection, which combines co-training with k-medoids. First we utilize k-medoids clustering algorithm to acquire some informative and presentative samples for labelling as our initial seeds set. Then we take advantage of the content features and behavior features of users for our co-training classification framework. In order to illustrate the effectiveness of k-medoids, we compare the performance with random selecting strategy. Finally, we evaluate the effectiveness of our proposed detection framework compared with several classical supervised algorithms.

Keywords: Semi-supervised learning k-medoids

1

·

Social spam

·

Co-training

·

Introduction

Social networking websites such as Twitter, Facebook, Sina Weibo and MySpace have gained more and more popularity and attentions around the world in recent years. Twitter, as a microblogging site, is the fastest growing one than any other social networking site and allows users to post their latest updates and share messages using no more than 140 characters, known as tweets. Users could communicate and stay in touch with their friends through the exchange of tweets. Accompanied by the popularity of social networks, spam as an indelible byproduct threatens users and social network websites with different forms and This work was supported by National Science Foundation of China (No. 61272374, 61300190) and 863 Project (No. 2015AA015463). c Springer International Publishing Switzerland 2016  H. Cao et al. (Eds.): PAKDD 2016 Workshops, LNAI 9794, pp. 214–226, 2016. DOI: 10.1007/978-3-319-42996-0 18

A Social Spam Detection Framework via Semi-supervised Learning

215

definitions. For example, spammers often use Twitter as a tool to post malicious links, send unsolicited messages to legitimate users, and hijack trending topics. Social spammers always create a mass of compromised or fake accounts to deceive users and lead them to access malicious websites which contain much illegal, pornography or dangerous information. Sometimes, the spammers disguise themselves as normal users and imitate the behavior of legitimate users. This kind of spammers are hardly to be discovered and also very harmful to legitimate users and social networks. As a countermeasure, Twitter has its own detection methods and rules against spam and abuse. Users who