Stochastic Block Models are a Discrete Surface Tension

  • PDF / 1,530,041 Bytes
  • 34 Pages / 439.37 x 666.142 pts Page_size
  • 74 Downloads / 210 Views

DOWNLOAD

REPORT


Stochastic Block Models are a Discrete Surface Tension Zachary M. Boyd1,2 · Mason A. Porter1 · Andrea L. Bertozzi1 Received: 6 June 2018 / Accepted: 19 March 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract Networks, which represent agents and interactions between them, arise in myriad applications throughout the sciences, engineering, and even the humanities. To understand large-scale structure in a network, a common task is to cluster a network’s nodes into sets called “communities,” such that there are dense connections within communities but sparse connections between them. A popular and statistically principled method to perform such clustering is to use a family of generative models known as stochastic block models (SBMs). In this paper, we show that maximum-likelihood estimation in an SBM is a network analog of a well-known continuum surface-tension problem that arises from an application in metallurgy. To illustrate the utility of this relationship, we implement network analogs of three surface-tension algorithms, with which we successfully recover planted community structure in synthetic networks and which yield fascinating insights on empirical networks that we construct from hyperspectral videos. Keywords Networks · Community structure · Data clustering · Stochastic block models (SBMs) · Merriman–Bence–Osher (MBO) scheme · Geometric partial differential equations Mathematics Subject Classification 65K10 · 49M20 · 35Q56 · 62H30 · 91C20 · 91D30 · 94C15

Communicated by Paul Newton.

B

Mason A. Porter [email protected] Zachary M. Boyd [email protected] Andrea L. Bertozzi [email protected]

1

Department of Mathematics, UCLA, Los Angeles, CA, USA

2

Department of Mathematics, University of North Carolina, Chapel Hill, USA

123

Journal of Nonlinear Science

1 Introduction The study of networks, in which nodes represent entities and edges encode interactions between entities (Newman 2018), can provide useful insights into a wide variety of complex systems in myriad fields, such as granular materials (Papadopoulos et al. 2018), disease spreading (Pastor-Satorras et al. 2015), criminology (Hegemann et al. 2011), and more. In the study of such applications, the analysis of large data sets—from diverse sources and applications—continues to grow ever more important. The simplest type of network is a graph, and empirical networks often appear to exhibit a complicated mixture of regular and seemingly random features (Newman 2018). Additionally, it is increasingly important to study networks with more complicated features, such as time dependence (Holme 2015), multiplexity (Kivelä et al. 2014), annotations (Newman and Clauset 2016), and connections that go beyond a pairwise paradigm (Otter et al. 2017). One also has to worry about “features” such as missing information and false positives (Kim and Leskovec 2011). Nevertheless, it is convenient in the present paper to restrict our attention to undirected, unweighted graphs for simplicity. To try to understand the large