An effective graph summarization and compression technique for a large-scaled graph

  • PDF / 2,246,387 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 27 Downloads / 201 Views

DOWNLOAD

REPORT


An effective graph summarization and compression technique for a large-scaled graph Hojin Seo1 · Kisung Park1 · Yongkoo Han1 · Hyunwook Kim1 · Muhammad Umair 1 · Kifayat Ullah Khan1 · Young-Koo Lee1

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract Graphs are widely used in various applications, and their size is becoming larger over the passage of time. It is necessary to reduce their size to minimize main memory needs and to save the storage space on disk. For these purposes, graph summarization and compression approaches have been studied in various existing studies to reduce the size of a large graph. Graph summarization aggregates nodes having similar structural properties to represent a graph with reduced main memory requirements. Whereas graph compression applies various encoding techniques so that the resultant graph needs lesser storage space on disk. Considering usefulness of both the paradigms, we propose to obtain best of the both worlds by combining summarization and compression approaches. Hence, we present a greedy-based algorithm that greatly reduces the size of a large graph by applying both the compression and

B

Young-Koo Lee [email protected] Hojin Seo [email protected] Kisung Park [email protected] Yongkoo Han [email protected] Hyunwook Kim [email protected] Muhammad Umair [email protected] Kifayat Ullah Khan [email protected]

1

Department of Computer Science and Engineering, Kyung Hee University, Yongin, South Korea

123

H. Seo et al.

summarization. We also propose a novel cost model for calculating the compression ratio considering both the compression and summarization strategies. The algorithm uses the proposed cost model to determine whether to perform one or both of them in every iteration. Through comprehensive experiments on real-world datasets, we show that our proposed algorithm achieves a better compression ratio than only applying summarization approaches by up to 16%. Keywords Graph · Summarization · Compression · Minimum Description Length

1 Introduction Graphs have been widely used to represent relationships between entities in various applications such as social networks, semantic webs, and XML. In almost every application domain, the use of graph-based data modeling is increasing rapidly and the size of resultant graphs is becoming larger too, where some of them having nodes reaching to billions. For example, the number of users of Facebook has now reached to over 2 billion in 2017.1 The need to store and analyze large graphs arises for various types of applications. For example, numerous analysis techniques have been proposed for social networks like link prediction, community discovery, graph structure learning, visualization, and so on. However, these processes suffer from scale of the underlying data as either the given graphs cannot reside in main memory or require massive I/O operations on disk. Recently, graph summarization and compression techniques have been studied to reduce the size of large graphs independently. The