Discovering Opinion Spammer Groups by Network Footprints
Online reviews are an important source for consumers to evaluate products/services on the Internet (e.g. Amazon, Yelp, etc.). However, more and more fraudulent reviewers write fake reviews to mislead users. To maximize their impact and share effort, many
- PDF / 910,697 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 76 Downloads / 212 Views
Abstract. Online reviews are an important source for consumers to evaluate products/services on the Internet (e.g. Amazon, Yelp, etc.). However, more and more fraudulent reviewers write fake reviews to mislead users. To maximize their impact and share effort, many spam attacks are organized as campaigns, by a group of spammers. In this paper, we propose a new two-step method to discover spammer groups and their targeted products. First, we introduce NFS (Network Footprint Score), a new measure that quantifies the likelihood of products being spam campaign targets. Second, we carefully devise GroupStrainer to cluster spammers on a 2-hop subgraph induced by top ranking products. We demonstrate the efficiency and effectiveness of our approach on both synthetic and real-world datasets from two different domains with millions of products and reviewers. Moreover, we discover interesting strategies that spammers employ through case studies of our detected groups. Keywords: Opinion spam · Spammer groups · Spam detection · Graph anomaly detection · Efficient hierarchical clustering · Network footprints
1
Introduction
Online reviews of products and services are an increasingly important source of information for consumers. They are valuable since, unlike advertisements, they reflect the testimonials of other, “real” consumers. While many positive reviews can increase the revenue of a business, negative reviews can cause substantial loss. As a result of such financial incentives, opinion spam has become a critical issue [17], where fraudulent reviewers fabricate spam reviews to unjustly promote or demote (e.g., under competition) certain products and businesses. Opinion spam is surprisingly prevalent; one-third of consumer reviews on the Internet1 , and more than 20% of reviews on Yelp2 are estimated to be fake. Despite being widespread, opinion spam remains a mostly open and challenging problem for at least two main reasons; (1) humans are incapable of distinguishing fake reviews based on text [25], which renders manual labeling extremely difficult 1 2
http://www.nytimes.com/2012/08/26/business/book-reviewers-for-hire-meet-ademand-for-online-raves.html http://www.businessinsider.com/20-percent-of-yelp-reviews-fake-2013-9
c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part I, LNAI 9284, pp. 267–282, 2015. DOI: 10.1007/978-3-319-23528-8 17
268
J. Ye and L. Akoglu
and hence supervised methods inapplicable, and (2) fraudulent reviewers are often professionals, paid by businesses to write detailed and genuine-looking reviews. Since the seminal work by Jindal and Liu [17], opinion spam has been the focus of research for the last 7-8 years (Section 5). Most existing work aim to detect individual spam reviews [12,17,20,21,25,26] or spammers [1,11,13, 18,22,26]. However, fraud/spam is often a collective act, where the involved individuals cooperate in groups to execute spam campaigns. This way, they can increase total impact (i.e., dominate the sentiments towards target products via floodi
Data Loading...