DataOps: Seamless End-to-End Anything-to-RDF Data Integration

While individual components for semantic data integration are commonly available, end-to-end solutions are rare.

PDF / 642,779 Bytes
5 Pages / 439.37 x 666.142 pts Page_size
72 Downloads / 183 Views

Abstract. While individual components for semantic data integration are commonly available, end-to-end solutions are rare. We demonstrate DataOps, a seamless Anything-to-RDF semantic data integration toolkit. DataOps supports the integration of both semantic and non-semantic data from an extensible host of diﬀerent formats. Setting up data sources end-to-end works in three steps: (1) accessing the data from arbitrary locations in diﬀerent formats, (2) specifying mappings depending on the data format (e.g., R2RML for relational data), and (3) consolidating new data with existing data instances (e.g., by establishing owl:sameAs links). All steps are supported through a fully integrated Web interface with conﬁguration forms and diﬀerent mapping editors. Visitors of the demo will be able to perform all three steps of the integration process.

1

Introduction

In recent years semantic data integration has evolved to an important application area in the industry: software eco systems in companies become more and more complex, produce large amounts of heterogenous information, and make it harder and harder to get a holistic view on the company’s knowledge. Traditionally, in such situations dedicated ETL-style systems are used for the analysis. Functionality is provided by large scale data warehouse systems or, more recently, by big data frameworks such as Hadoop YARN [1] that work as a data operating systems running a mix of data warehousing and other applications. Those systems share an important property for enterprise data analysis: available as ready solutions, they include everything - from assisted setup, a broad selection of access methods, over graphical conﬁguration interfaces to a comprehensive documentation and support. However, they come along with a downside: in classical data warehouses a dedicated, global warehousing schema must be designed, mappings must be constructed, and the resulting schema must be documented and communicated to users. For Hadoop-like systems, this is not necessarily the case, as a mix of relational and non-relational workloads are possible with diﬀerent applications in the system. This comes at the price of either a very small set of supported c Springer International Publishing Switzerland 2015 F. Gandon et al. (Eds.): ESWC 2015, LNCS 9341, pp. 123–127, 2015. DOI: 10.1007/978-3-319-25639-9 24

124

C. Pinkel et al.

queries and little ﬂexibility, or involves even more initial eﬀort for programming all the tasks and queries to be supported. Worse, with a number of data sources that quickly change in structure, maintenance of the resulting schema, mapping and queries can quickly become a nightmare in either case. Often enough, the eﬀort for setup and maintenance becomes unacceptable, especially if some data sources are complex in structure. This contributes to the current situation where enterprises are assumed to analyze less than one sixth of their potentially relevant data.1 Semantic data integration, with its ﬂexible graph model and vocabularies, is one possible and natural way

Data Loading...

DataOps: Seamless End-to-End Anything-to-RDF Data Integration

Recommend Documents

Seamless R and C++ Integration with Rcpp

Dynamic and Seamless Integration of Production, Logistics and Traffic

Data Integration

Data Integration

Decentralized Data Integration System

Integration of Behavioral Data

View-based Data Integration

Data Warehouse Integration

XML Data Integration

Data Integration in Web Data Extraction System

Peer-to-Peer Data Integration

Algorithms for Spatial Data Integration