Identifying and exploiting homogeneous communities in labeled networks

  • PDF / 1,478,290 Bytes
  • 20 Pages / 595 x 794 pts Page_size
  • 105 Downloads / 184 Views

DOWNLOAD

REPORT


(2020) 5:55

Applied Network Science

RESEARCH

Open Access

Identifying and exploiting homogeneous communities in labeled networks Salvatore Citraro1,2*

and Giulio Rossetti2

*Correspondence: [email protected] 1 Department of Computer Science, University of Pisa, Pisa, Italy 2 KDD-Lab, ISTI-CNR, Pisa, Italy

Abstract Attribute-aware community discovery aims to find well-connected communities that are also homogeneous w.r.t. the labels carried by the nodes. In this work, we address such a challenging task presenting EVA, an algorithmic approach designed to maximize a quality function tailoring both structural and homophilic clustering criteria. We evaluate EVA on several real-world labeled networks carrying both nominal and ordinal information, and we compare our approach to other classic and attribute-aware algorithms. Our results suggest that EVA is the only method, among the compared ones, able to discover homogeneous clusters without considerably degrading partition modularity. We also investigate two well-defined applicative scenarios to characterize better EVA: i) the clustering of a mental lexicon, i.e., a linguistic network modeling human semantic memory, and (ii) the node label prediction task, namely the problem of inferring the missing label of a node. Keywords: Labeled community discovery, Network homophily, Node label prediction Introduction Node clustering, also known as community discovery, is one of the most important and productive tasks in complex network analysis. It is considered one of the most challenging subjects of the field, due to its intrinsic ill-posedness: the absence of a single, universal definition of community leads to a heterogeneous landscape of alternative approaches, each one redefining the task w.r.t. different topological criteria (density, bridge detection, feature propagation. . . ). In such a complex scenario, a meta-definition of community discovery helps us to overcome the wide range of possible definitions: Definition 1 (Community discovery). Given a network G, a community c = {v1 , v2 , . . . , vn } is a set of distinct nodes of G. The community discovery problem aims to identify the set C of all the communities in G. Community discovery algorithms aim to find well-connected, possibly well-separated, clusters of nodes. While classic approaches are driven only by the graph structure, nowadays, a large number of models includes extra features to complement the expressive © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article