Large-scale network motif analysis using compression
- PDF / 923,128 Bytes
- 33 Pages / 439.37 x 666.142 pts Page_size
- 18 Downloads / 260 Views
Large-scale network motif analysis using compression Peter Bloem1
· Steven de Rooij2
Received: 13 September 2019 / Accepted: 14 May 2020 © The Author(s) 2020
Abstract We introduce a new method for finding network motifs. Subgraphs are motifs when their frequency in the data is high compared to the expected frequency under a null model. To compute this expectation, a full or approximate count of the occurrences of a motif is normally repeated on as many as 1000 random graphs sampled from the null model; a prohibitively expensive step. We use ideas from the minimum description length literature to define a new measure of motif relevance. With our method, samples from the null model are not required. Instead we compute the probability of the data under the null model and compare this to the probability under a specially designed alternative model. With this new relevance test, we can search for motifs by random sampling, rather than requiring an accurate count of all instances of a motif. This allows motif analysis to scale to networks with billions of links. Keywords Motifs · Network analysis · Minimum description length
1 Introduction Graphlets are small, induced subgraphs in a large network. Network motifs (Milo et al. 2002) are those graphlets that occur more frequently in the data than expected. To be able to conclude that such frequent subgraphs really represent meaningful aspects of the data, we must first show that they are not simply a product of chance. That is, a subgraph may simply be a frequent subgraph in any random graph: a subgraph is only a motif if its frequency is higher than expected.
Responsible editor: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier.
B
Peter Bloem [email protected] Steven de Rooij [email protected]
1
Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
2
Mathematisch Instituut, Leiden University, Leiden, The Netherlands
123
P. Bloem, S. de Rooij
This expectation is defined in reference to a null model: a probability distribution over graphs. We determine what the expected frequency of the subgraph is under the null model, and if the observed frequency is substantially higher than this expectation, the subgraph is a motif. Unfortunately, there is usually no efficient way to compute the expected frequency of a subgraph under the null model. The most common approach generates a large number of random graphs from the null model and compares the frequencies of the subgraph in this sample to its frequency in the data (Milo et al. 2002). This means that any resources invested in extracting the motifs from the data must be invested again for every graph in the sample to find out which subgraphs are motifs. We introduce an alternative method that does not require such sampling from the null model. Instead, we use two probability distributions on graphs: the null model p null , and a distribution p motif under which graphs with one or more frequent subgraphs have high probability. If a subgraph M of a given g
Data Loading...