Network representation learning: a systematic literature review

  • PDF / 1,621,899 Bytes
  • 33 Pages / 595.276 x 790.866 pts Page_size
  • 95 Downloads / 350 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

REVIEW

Network representation learning: a systematic literature review Bentian Li1



Dechang Pi1,2

Received: 4 September 2019 / Accepted: 6 April 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Omnipresent network/graph data generally have the characteristics of nonlinearity, sparseness, dynamicity and heterogeneity, which bring numerous challenges to network related analysis problem. Recently, influenced by the excellent ability of deep learning to learn representation from data, representation learning for network data has gradually become a new research hotspot. Network representation learning aims to learn a project from given network data in the original topological space to low-dimensional vector space, while encoding a variety of structural and semantic information. The vector representation obtained could effectively support extensive tasks such as node classification, node clustering, link prediction and graph classification. In this survey, we comprehensively present an overview of a large number of network representation learning algorithms from two clear points of view of homogeneous network and heterogeneous network. The corresponding algorithms are deeply analyzed. Extensive applications are introduced in an all-round way, and related experiments are conducted to validate the typical algorithms. Finally, we point out five future promising directions for next research in terms of theory and application. Keywords Representation learning  Network embedding  Network data mining  Deep neural network

1 Introduction Big data have aroused extensive attention of industry and academia [1–3]. It is worth noting that most of the current studies are based on the assumption of independence between data, which leads to the fact that mining largescale data is far from enough. In fact, there is a general relation between these data. For example, large-scale image and text can be constructed as the network for realizing the multi-information fusion [4]. Polypharmacy side effects of the drug–drug interactions may be affected with protein–protein interactions, drug–protein target interactions [5]. That is to say, there not only exist big data

& Dechang Pi [email protected] Bentian Li [email protected] 1

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

2

Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 211106, China

with a single data type including image, text, speech and video, but also exists the ubiquitous network data, such as Google knowledge graph, protein–protein interaction network, gene network, brain network, Internet, social network, multimedia network, molecular compound network and traffic network. In particular, the recent online social network represented by Twitter, WeChat and Facebook has entered the era of billions of nodes, making it more urgent for researchers to study large-scale network data. However, diffe