In-memory parallelization of join queries over large ontological hierarchies

PDF / 1,815,925 Bytes
38 Pages / 439.37 x 666.142 pts Page_size
0 Downloads / 269 Views

In‑memory parallelization of join queries over large ontological hierarchies Dimitris Bilidas1 · Manolis Koubarakis1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The Resource Description Framework (RDF) data model enables the construction of knowledge graphs over various domains, using ontologies in order to encode information about the domain, and simple statements in the form of subject-predicateobject triples for data representation, facilitating the interlinking and exchange of Web data. However, this simplicity comes with the cost of having to execute a large number of joins in order to get the desirable query results, while at the same time large ontological hierarchies complicate the query answering process even more, for systems that provide complete answers with respect to such ontological axioms. In this work we present PARJ, an in-memory RDF store which takes into consideration ontological hierarchies during join processing with very low performance overhead, avoiding expensive preprocessing and materialization of implications, and is also amenable to straightforward parallelization. Specifically, we present a join implementation that allows to achieve any desired degree of parallelism on arbitrary join queries and RDF graphs stored in memory using compact vertical partitioning. We use an adaptive join processing approach, such that we take advantage of complete or even partial ordering of RDF data, which is compactly stored in order to increase spatial locality and keep memory consumption low, coupled with an IDto-Position vector index used when ordering does not allow for efficient scanning of the input relation. Finally, we experimentally show the efficiency and scalability of our proposal. Keywords RDF · SPARQL · OWL · Join processing

* Dimitris Bilidas [email protected] Manolis Koubarakis [email protected] 1

National and Kapodistrian University of Athens, Athens, Greece

13

Vol.:(0123456789)

Distributed and Parallel Databases

1 Introduction The Resource Description Framework (RDF) 1 is a data model recommended by the W3C for semantic data integration, sharing and linking across different organizations and applications on the Web. RDF provides flexible modeling of data coming from heterogeneous domains in the form of triples forming subject-predicate-object statements, facilitating the construction of Knowledge Graphs. Every component of such a triple is a resource uniquely identified by an IRI or a data value in the form of a literal. The latter can only be present in the object position. A set of such statements can be considered an RDF graph, where subjects and objects are nodes and there exists an arc labeled with the property name, connecting corresponding subject and object for each statement. Several organizations publish data in the RDF model, leading to interlinking information from different sources and automatic processing using software agents. As a result, as of 2019 the Linked Open Data (LOD) cloud [49] contains more than 1200 datasets a

Data Loading...

In-memory parallelization of join queries over large ontological hierarchies

Recommend Documents

Secure Distributed Queries over Large Sets of Personal Home Boxes

Parallelization of Fractal Image Compression Over CUDA

Range Queries over Encrypted Data

Evaluation of Fuzzy Queries Over Multimedia Systems

Nearest Neighbor Queries over Encrypted Data

K-Nearest Neighbor Queries Over Encrypted Data

Parallelization

Hierarchies of Hamiltonian structures

Seamless Interpolation Between Contraction Hierarchies and Hub Labels for Fast and Space-Efficient Shortest Path Queries

Hierarchies

Load Shedding for Window Queries Over Continuous Data Streams

Spatial and Spatio-temporal Queries Over Moving Objects