WODII: a solution to process SPARQL queries over distributed data sources

  • PDF / 775,118 Bytes
  • 8 Pages / 595.276 x 790.866 pts Page_size
  • 94 Downloads / 201 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

WODII: a solution to process SPARQL queries over distributed data sources Ahmed Rabhi1



Rachida Fissoune1

Received: 2 March 2019 / Revised: 10 July 2019 / Accepted: 19 October 2019  Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract The web of data can be seen as a distributed environment hosting structured and linked data based on Semantic Web standards. This is one of the promising features for Semantic Web developers who would benefit from having the possibility to remotely access different RDF repositories, available on the web, in order to collect fragments of information from several sources and combine the resulting parts in an integrated answer. In this paper, we propose an index-based solution, Web of Data Information Integrator (WoDII), to process SPARQL queries over independent data sources without having a prior knowledge of the sources contributing to the answer. By relying on an index, the system avoids non-relevant sources and maps each selected source to a cluster of sub-queries, as a result, network traffic decreases, making the process less dependent on the quality of the connection flow. Keywords SPARQL  Web of data  Aggregated search  Ontology-based data access

1 Introduction The Web is evolving from a ‘‘Web of linked documents’’ into a ‘‘Web of linked data’’ providing better opportunities for sharing and searching information. Actually, the web of data can be seen as a giant collection of graphs containing structured data in machine-readable format based on semantic web design principals and standards, thus providing semantic developers with an effective tool to remotely access data in the web. The linking open data (LOD) cloud forms a large graph consisting of billions of structured RDF data distributed on various Data sets available on the web. These data sets are accessed via SPARQL Endpoints that allow SPARQL queries execution. A sought information may not exist entirely in a single RDF repository and could require retrieving its parts from several sources, moreover, SPARQL Endpoint are developed and managed & Ahmed Rabhi [email protected] Rachida Fissoune [email protected] 1

ENSA of Tangier Abdelmalek Essaadi University, Tangier, Morocco

independently and have varying performances (execution time and availability). Consequently, the user has to execute multiple SPARQL queries over these Endpoints to aggregate fragments of data instead of writing a single query integrating all the parts of the information, which becomes complex depending on the query complexity. Therefore, it is necessary to set up an aggregated search engine able to distribute the process of SPARQL queries over several Endpoints and integrate the retrieved data fragments into a unified answer. Actually, several studies were carried out with the aim of executing SPARQL queries on distributed data sources in the Web of Data and joining results in a single final answer. Considering the distribution of data and its sources’ indep