Towards finding the best-fit distribution for OSN data

PDF / 2,138,901 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
38 Downloads / 227 Views

Towards finding the best‑fit distribution for OSN data Subhayan Bhattacharya1 · Sankhamita Sinha2 · Sarbani Roy3 · Amarnath Gupta4

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Currently, all online social networks (OSNs) are considered to follow a power-law distribution. In this paper, the degree distribution for multiple OSNs has been studied. It is seen that the degree distributions of OSNs differ moderately from a power law. Lognormal distributions are an alternative to power-law distributions and have been used as best fit for many complex networks. It is seen that the degree distributions of OSNs differ massively from a lognormal distribution. Thus, for a better fit, a composite distribution combining power-law and lognormal distribution is suggested. This paper proposes an approach to find the most suitable distribution for a given degree distribution out of the six possible combinations of power law and lognormal, namely power law, lognormal, power law–lognormal, lognormal– power law, double power law, and double power law lognormal. The errors in the fitted composite distribution and the original degree distribution of the OSNs are observed. It is seen that a composite distribution fitted using the approach described in this paper is always a better fit than both power-law and lognormal distributions. Keywords Composite distribution · Online social networks · Power-law distribution · Lognormal distribution

* Sarbani Roy [email protected] Subhayan Bhattacharya [email protected] Sankhamita Sinha [email protected] Amarnath Gupta [email protected] 1

School of Mobile Computing and Communication, Jadavpur University, Kolkata, India

2

Department of Computer Application, Meghnad Saha Institute of Technology, Kolkata, India

3

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India

4

San Diego Supercomputer Center, University of California, San Diego, USA

13

Vol.:(0123456789)

S. Bhattacharya et al.

1 Introduction We continue to observe a tremendous growth in the volume of data and inter-user communication in online social networks (OSN) such as Facebook, Twitter, YouTube, Instagram, WhatsApp, Snapchat, Google+, Quora, and LiveJournal. It is reported that there are 4.39 billion Internet users in January 2019, an increase of 366 million (9 %) compared to January 2018 [1]. The average Facebook user in 2019 has about 338 friends, while the median number of friends is 200 [2]. When we model OSNs as graphs, the nodes of graph are entities including users, posts, and topic symbols like hashtags, and the edges represent binary relationships such as likes, replies to, and mentions that hold within and across entity types—the whole network is a directed, node-labelled, and edge-labelled graph. Several important classes of social network analysis rely on features (computed properties) that describe the structure of a network. For example, studies in network evolution [3–5] compare network-level features (size, average density,

Data Loading...

Towards finding the best-fit distribution for OSN data

Recommend Documents

Breadth search strategies for finding minimal reducts: towards hardware implementation

Towards an Access Regime for Mobility Data

Data Capture and Distribution

Finding the Hardest Formulas for Resolution

Towards Designing Conceptual Data Models for Big Data Warehouses: The Genomics Case

Finding the Right Supervisor

Forensic Source Identification of OSN Compressed Images

HIV Vaccines and Cure The Path Towards Finding an Effective Cure an

Towards Guaranteeing Global Consistency for Peer-Based Data Integration Architecture

Towards Ontology Based Data Extraction for Organizational Goals Metrics Indicator

Modeling and Analyzing for Data Durability Towards Cloud Storage Services

DataStates: Towards Lightweight Data Models for Deep Learning