Defining and measuring probabilistic ego networks

  • PDF / 1,738,561 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 68 Downloads / 202 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Defining and measuring probabilistic ego networks Amin Kaveh1   · Matteo Magnani1 · Christian Rohner1 Received: 31 March 2020 / Revised: 27 October 2020 / Accepted: 6 November 2020 © The Author(s) 2020

Abstract Analyzing ego networks to investigate local properties and behaviors of individuals is a fundamental task in social network research. In this paper we show that there is not a unique way of defining ego networks when the existence of edges is uncertain, since there are two different ways of defining the neighborhood of a node in such network models. Therefore, we introduce two definitions of probabilistic ego networks, called V-Alters-Ego and F-Alters-Ego, both rooted in the literature. Following that, we investigate three fundamental measures (degree, betweenness and closeness) for each definition. We also propose a method to approximate betweenness of an ego node among the neighbors which are connected via shortest paths with length 2. We show that this approximation method is faster to compute and it has high correlation with ego betweenness under the V-Alters-Ego definition in many datasets. Therefore, it can be a reasonable alternative to represent the extent to which a node plays the role of an intermediate node among its neighbors. Keywords  Probabilistic networks · Ego networks · Local properties · Betweenness · Closeness

1 Introduction Empirical social network data collection is often an imperfect process affected by some degree of uncertainty. Uncertainty can come from different sources. For example because of missing information and indirect measurements, as in the case when we infer social ties or influence relationships between individuals based on their interactions (Aggarwal and Wang 2010; Bernard et al. 1982). Uncertainty can be available even when we are asking about the immediate connections of an individual in social networks for example due to forgetfulness of informants (Bernard et al. 1979; Killworth and Bernard 1979). To model uncertain information in networks, probabilistic models in which each edge is associated with an independent probability are the typical choice in the literature (Asthana et al. 2004; Poisot et al. 2016; Rhodes et al. 2005). Despite the fact that uncertainty affects several types of data collection processes, the majority of works on social networks ignore it. More precisely, in data collection a thresholding approach is typically used, in which if the degree of confidence about the existence of an * Amin Kaveh [email protected] 1



InfoLab, Department of Information Technology, Uppsala University, 75105 Uppsala, Sweden

edge is higher than a specific value, then we draw and edge between those nodes. However, the selection of a threshold value is a subjective task. As an example, De Choudhury et al. (2010) have studied two email exchange datasets (a university email dataset and the Enron email dataset) to infer unobserved social ties using the number of exchanged emails between pairs of individuals. They have inferred the existence of a so