Compression of Dynamic Graphs Generated by a Duplication Model

  • PDF / 2,198,646 Bytes
  • 21 Pages / 439.37 x 666.142 pts Page_size
  • 56 Downloads / 224 Views

DOWNLOAD

REPORT


Compression of Dynamic Graphs Generated by a Duplication Model Krzysztof Turowski1,3 · Abram Magner2 · Wojciech Szpankowski1 Received: 9 October 2018 / Accepted: 12 March 2020 © The Author(s) 2020

Abstract We continue building up the information theory of non-sequential data structures such as trees, sets, and graphs. In this paper, we consider dynamic graphs generated by a full duplication model in which a new vertex selects an existing vertex and copies all of its neighbors. We ask how many bits are needed to describe the labeled and unlabeled versions of such graphs. We first estimate entropies of both versions and then present asymptotically optimal compression algorithms up to two bits. Interestingly, for the full duplication model the labeled version needs Θ(n) bits while its unlabeled version (structure) can be described by Θ(log n) bits due to significant amount of symmetry (i.e. large average size of the automorphism group of sample graphs). Keywords  Random graphs · Structural entropy · Graph compression · Duplication model

1 Introduction Complex systems can often be modeled as dynamic graphs. In these systems, patterns of interactions evolve in time, determining emergent properties, associated function, robustness, and security of the system. There are several broad questions whose answers shed light on the evolution of such dynamic networks: (i) how many bits are required to best describe such a network and its structure (i.e., unlabeled This work was supported by NSF Center for Science of Information (CSoI) Grant CCF-0939370, and in addition by NSF Grant CCF-1524312, and National Science Center, Poland, under Grant UMO-2016/21/B/ST6/03146. * Krzysztof Turowski [email protected] 1

Center for Science of Information, Purdue University, West Lafayette, IN, USA

2

Department of Computer Science, University at Albany, SUNY, Albany, NY, USA

3

Theoretical Computer Science Department, Jagiellonian University, Krakow, Poland



13

Vol.:(0123456789)

Algorithmica

underlying graph); (ii) how to infer underlying dynamic processes governing network evolution; (iii) how to infer information about previous states of the network; and (iv) how to predict the forward evolution of the network state. In this paper we deal with the first question (i.e., labeled and unlabeled graph compression). To better understand the evolution of network structural properties, several probabilistic models have been proposed, including, e.g., the preferential attachment, duplication-divergence, Cooper-Frieze, and fit-get richer models [2, 6, 10, 24]. Clearly, some models are more suitable to certain types of data than others. For example, it has been claimed that the preferential attachment mechanism [2] plays a strong role in the formation of citation networks [23]. However, due to the high power law exponent of their degree sequence (greater than 2) and lack of community structure [6], preferential attachment graphs are not likely to describe well biological networks such as protein interaction networks or gen