Bayesian Testing for Exogenous Partition Structures in Stochastic Block Models

  • PDF / 1,419,109 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 16 Downloads / 209 Views

DOWNLOAD

REPORT


Bayesian Testing for Exogenous Partition Structures in Stochastic Block Models Sirio Legramanti Bocconi University, Milano, Italy

Tommaso Rigon Duke University, Durham, USA

Daniele Durante Bocconi University, Milano, Italy Abstract Network data often exhibit block structures characterized by clusters of nodes with similar patterns of edge formation. When such relational data are complemented by additional information on exogenous node partitions, these sources of knowledge are typically included in the model to supervise the cluster assignment mechanism or to improve inference on edge probabilities. Although these solutions are routinely implemented, there is a lack of formal approaches to test if a given external node partition is in line with the endogenous clustering structure encoding stochastic equivalence patterns among the nodes in the network. To fill this gap, we develop a formal Bayesian testing procedure which relies on the calculation of the Bayes factor between a stochastic block model with known grouping structure defined by the exogenous node partition and an infinite relational model that allows the endogenous clustering configurations to be unknown, random and fully revealed by the block–connectivity patterns in the network. A simple Markov chain Monte Carlo method for computing the Bayes factor and quantifying uncertainty in the endogenous groups is proposed. This strategy is evaluated in simulations, and in applications studying brain networks of Alzheimer’s patients. AMS (2000) subject classification. Primary 62-XX; Secondary 62F15. Keywords and phrases. Bayes factor, Brain network, Chinese restaurant process, Infinite relational model, Stochastic equivalence

1 Introduction There is an extensive interest in learning grouping structures among the nodes in a network (see, e.g. Fortunato and Hric, 2016). Classical solutions to this problem focus on detecting community patterns via algorith-

2

S. Legramanti et al.

mic approaches that cluster the nodes into groups characterized by a high number of edges within each community and comparatively few edges between the nodes in different communities (Newman and Girvan, 2004; Blondel et al., 2008; Fortunato, 2010). Despite being routinely implemented, these procedures do not rely on generative probabilistic models and, therefore, face difficulties when the focus is not just on point estimation, but also on hypothesis testing and uncertainty quantification. This issue has motivated several efforts towards developing model–based representations for inference on grouping structures, with the stochastic block model (sbm) (Holland et al., 1983; Nowicki and Snijders, 2001) providing the most notable contribution within this class. Such a statistical model expresses the edge probabilities as a function of the node assignments to groups and of block probabilities among such groups, thus allowing inference on more general block–connectivity patterns beyond classical community structures. The success of sbms in different fields has motivated various extensions (e.g. Kemp e