DataOps: Seamless End-to-End Anything-to-RDF Data Integration
While individual components for semantic data integration are commonly available, end-to-end solutions are rare.
- PDF / 642,779 Bytes
- 5 Pages / 439.37 x 666.142 pts Page_size
- 72 Downloads / 177 Views
Abstract. While individual components for semantic data integration are commonly available, end-to-end solutions are rare. We demonstrate DataOps, a seamless Anything-to-RDF semantic data integration toolkit. DataOps supports the integration of both semantic and non-semantic data from an extensible host of different formats. Setting up data sources end-to-end works in three steps: (1) accessing the data from arbitrary locations in different formats, (2) specifying mappings depending on the data format (e.g., R2RML for relational data), and (3) consolidating new data with existing data instances (e.g., by establishing owl:sameAs links). All steps are supported through a fully integrated Web interface with configuration forms and different mapping editors. Visitors of the demo will be able to perform all three steps of the integration process.
1
Introduction
In recent years semantic data integration has evolved to an important application area in the industry: software eco systems in companies become more and more complex, produce large amounts of heterogenous information, and make it harder and harder to get a holistic view on the company’s knowledge. Traditionally, in such situations dedicated ETL-style systems are used for the analysis. Functionality is provided by large scale data warehouse systems or, more recently, by big data frameworks such as Hadoop YARN [1] that work as a data operating systems running a mix of data warehousing and other applications. Those systems share an important property for enterprise data analysis: available as ready solutions, they include everything - from assisted setup, a broad selection of access methods, over graphical configuration interfaces to a comprehensive documentation and support. However, they come along with a downside: in classical data warehouses a dedicated, global warehousing schema must be designed, mappings must be constructed, and the resulting schema must be documented and communicated to users. For Hadoop-like systems, this is not necessarily the case, as a mix of relational and non-relational workloads are possible with different applications in the system. This comes at the price of either a very small set of supported c Springer International Publishing Switzerland 2015 F. Gandon et al. (Eds.): ESWC 2015, LNCS 9341, pp. 123–127, 2015. DOI: 10.1007/978-3-319-25639-9 24
124
C. Pinkel et al.
queries and little flexibility, or involves even more initial effort for programming all the tasks and queries to be supported. Worse, with a number of data sources that quickly change in structure, maintenance of the resulting schema, mapping and queries can quickly become a nightmare in either case. Often enough, the effort for setup and maintenance becomes unacceptable, especially if some data sources are complex in structure. This contributes to the current situation where enterprises are assumed to analyze less than one sixth of their potentially relevant data.1 Semantic data integration, with its flexible graph model and vocabularies, is one possible and natural way
Data Loading...