Characterizing the hypergraph-of-entity and the structural impact of its extensions

  • PDF / 3,916,779 Bytes
  • 42 Pages / 595.276 x 790.866 pts Page_size
  • 66 Downloads / 162 Views

DOWNLOAD

REPORT


pplied Network Science

Open Access

RESEARCH

Characterizing the hypergraph‑of‑entity and the structural impact of its extensions José Devezas*  and Sérgio Nunes *Correspondence: [email protected] INESC TEC and Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, s/n, 4200‑465 Porto, PT, Portugal

Abstract  The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density measure based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents. We then repeat the analysis over three extensions—materialized through synonym, context, and tf_bin hyperedges—in order to assess their structural impact in the hypergraph. Finally, we focus on the application-specific aspects of the hypergraph-of-entity, in the domain of information retrieval. We analyze the correlation between the retrieval effectiveness and the structural features of the representation model, proposing ranking and anomaly indicators, as useful guides for modifying or extending the hypergraph-of-entity. Keywords:  Hypergraph-of-entity, Hypergraph analysis, Information retrieval, Indexing, Combined data, Representation model, Characterization

Introduction Complex networks have frequently been studied as graphs, but only recently has attention been given to the study of complex networks as hypergraphs  (Estrada and Rodriguez-Velazquez 2005). The hypergraph-of-entity  (Devezas and Nunes 2019) is a hypergraph-based model used to represent combined data (Bast et al. 2016, §2.1.3). That is, it is a joint representation of corpora and knowledge bases, integrating terms, entities and their relations. It attempts to solve, by design, the issues of representing combined data through inverted indexes and quad indexes. The hypergraph-of-entity, together with its random walk score (Devezas and Nunes 2019, §4.2.2), is also an attempt to generalize several tasks of entity-oriented search. This includes ad hoc document retrieval and ad hoc entity retrieval, as well as the recommendation-alike tasks of related entity finding and entity list completion. However, there is a tradeoff. On one side, the random

© The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any