CrawlSN: community-aware data acquisition with maximum willingness in online social networks

  • PDF / 2,093,725 Bytes
  • 32 Pages / 439.37 x 666.142 pts Page_size
  • 11 Downloads / 166 Views

DOWNLOAD

REPORT


CrawlSN: community-aware data acquisition with maximum willingness in online social networks Bay-Yuan Hsu1 · Chia-Lin Tu2 · Ming-Yi Chang2 · Chih-Ya Shen2 Received: 10 January 2020 / Accepted: 25 July 2020 / Published online: 8 September 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Real social network datasets with community structures are critical for evaluating various algorithms in Online Social Networks (OSNs). However, obtaining such community data from OSNs has recently become increasingly challenging due to privacy issues and government regulations. In this paper, we thus make our first attempt to address two important factors, i.e., user willingness and existence of community structure, to obtain more complete OSN data. We formulate a new research problem, namely Community-aware Data Acquisition with Maximum Willingness in Online Social Networks (CrawlSN), to identify a group of users from an OSN, such that the group is a socially tight community and the users’ willingness to contribute data is maximized. We prove that CrawlSN is NP-hard and inapproximable within any factor unless, and propose an effective algorithm, named Community-aware Group Identification with Maximum Willingness (CIW ) with various processing strategies. We conduct an evaluation study with 1093 volunteers to validate our problem formulation and demonstrate that CrawlSN outperforms the other alternatives. We also perform extensive experiments on 7 real datasets and show that the proposed CIW outperforms the other baselines in both solution quality and efficiency. Keywords Social networks · Graph algorithm · Data acquisition

Responsible editor: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10618020-00709-5) contains supplementary material, which is available to authorized users.

B

Chih-Ya Shen [email protected]

Extended author information available on the last page of the article

123

1590

B.-Y. Hsu et al.

1 Introduction Real social network datasets with community structures that include complete node and edge information are essential for evaluating various algorithms in Online Social Networks (OSNs) for different applications. For example, such real social network datasets can be used to evaluate the algorithms for link prediction, node classification, community detection, dense subgraph extraction, graph convolution networks, and graph embedding. However, obtaining such real datasets with community structures from OSNs (represented as a graph) is not a simple task for two reasons: the node and edge information of the community should be completely acquired, and the community structure should be ensured at the same time. The main challenge of such a task is that in order to obtain detailed node (user) and edge (user–user interaction) information, such as posts, likes, check-ins, interactions/relations with other users in OSNs, one