Graph communities in Neo4j
- PDF / 2,272,448 Bytes
- 11 Pages / 595.276 x 790.866 pts Page_size
- 39 Downloads / 219 Views
ORIGINAL PAPER
Graph communities in Neo4j Four algorithms at work Georgios Drakopoulos1 · Panagiotis Gourgaris2 · Andreas Kanavos3 Received: 8 January 2018 / Accepted: 13 May 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018
Abstract Community discovery is an essential topic in social network analysis since it provides a way for recursively decomposing a large social graph to easily interpretable subgraphs. The implementation of four major community discovery algorithms, namely the Newman–Girvan or Edge Betweeness, the Walktrap, the Louvain, and the CNM as Java analytics over Neo4j is described. Their correctness was evaluated functionally in two real Twitter graphs with vastly different characteristics. This was done on the grounds that a successful structural graph partitioning should eventually be reflected in the network functionality domain. Additionally, most real world graphs lack a list of ground truth communities, rendering a structural verification difficult, while functionality can be easily observed in most cases. Naturally, this renders the evaluation network-specific, as different social networks have different operational characteristics. The primary algorithmic finding was that the Louvain algorithm yields Twitter communities whose distribution size matches closer, in terms of the Kullback–Leibler divergence, the tweet and retweet distributions, with Newman–Girvan, Walktrap, and CNM following in that order. Keywords Community discovery · CNM · Louvain · Edge betweeness · Walktrap · Graph databases · Higher order data · Neo4j · Twitter4j
1 Introduction Decomposing a social graph into communities is by no means a trivial task. Besides the requirement that at least a considerable segment of the social network, if not all, need to be examined, the question of what constitutes a community, although posed in easily intuitively understood terms, has yet to be given a single definite formal answer. That * Georgios Drakopoulos [email protected] Panagiotis Gourgaris [email protected] Andreas Kanavos [email protected] 1
Cloudminers Inc. and Department of Informatics, Ionian University, Corfu, Greece
2
Mechanical and Aerospace Department, University of Patras, Patras, Greece
3
Computer Engineering and Informatics Department, University of Patras, Patras, Greece
is not to say that no formal community definition exists. Quite on the contrary, a plethora of such definitions has been proposed (Fortunato 2010), successfully formalizing human social organization. However, they differ in key aspects and, therefore, they lead to different algorithms. Graph databases such as Neo4j, GraphDB, and BrightstarDB during the current decade have progressively established themselves as reliable and scalable production grade tools, either as standalone servers or in conjunction with row stores, as the latter are optimized for different query types. In addition to front-end analytics such as link prediction, minimum spanning trees, DFS, and BFS, they offer back-end storage for mass
Data Loading...