Improving graph-based label propagation algorithm with group partition for fraud detection

  • PDF / 1,791,288 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 84 Downloads / 255 Views

DOWNLOAD

REPORT


Improving graph-based label propagation algorithm with group partition for fraud detection Jiahui Wang 1

&

Yi Guo 1,2,3

&

Xinxiu Wen 1 & Zhihong Wang 1 & Zhen Li 4 & Minwei Tang 4

Received: 13 August 2019 / Revised: 26 February 2020 / Accepted: 16 April 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Fraudulent user detection is a crucial issue in financial risk management. Due to the lack of labeled data and the reliability of labeling, label propagation algorithms (LPA) are effective solutions in this scenario. Most existing models only propagate the risk probabilities for individual users through feature level, while ignoring the real-world graph structure and the characteristics of gang crime. This paper improves the graph-based LPA through group partition, which can be directly implemented on the graph at hand with full consideration of the group information. The exhaustive experimental results testify the performance of our proposed model KGLPA over other off-the-shelf models and amend the insufficiency of feature-based LPA with higher reliability and stability to improve the detection of fraudulent users and secure the marketing budgets. Keywords Label propagation . Group partition . Semi-supervised . Knowledge graph . Fraud detection . Risk management

1 Introduction Online Fraud is increasing dramatically since the advent of the e-commerce era. A large number of fraudulent users defraud the marketing fund provided by the e-commerce platform through various illegal methods. As is reported, the Internet Crime Complaint Center (IC3) has received 467,361 complaints, which directly caused more than 5.5 billion in economic losses in 2019 [1]. Therefore, it is particularly urgent to detect the fraudulent and high-risky users from a large number of regular users in time [2, 3]. Nowadays, there are still some challenges: (a) Lacking sufficient labeled data with high reliability. (b) Detecting the fraudulent groups which consist of many seemly regular users. (c) Ensuring the sustainability of the model.

Unlike credit card fraud detection, fraudulent user detection does nott have enough reliable labeled data because nobody would tell how he defrauded from the e-commerce platform in the initiative stage. If we treat users who are not in the blacklist database as normal users, the reliability of positive samples might be very low. Lacking of sufficient labeled data is always a significant and unavoidable challenge in anti-fraud field. At the same time, labeling often requires expensive human labor and the reliability totally depends on the experts’ experience. Supervised learning completely trusts labels, while unsupervised learning completely discards labels. Label propagation algorithm (LPA) is neutral as a semi-supervised learning, using partial labels to predict the rest [4, 5]. Therefore, it could be generalized as a label propagation problem which aims to detect more fraudulent users from the known.

* Yi Guo [email protected] Jiahui Wang [email protected]

Minwei Tang tang