FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework

Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate triple stores, they mostly provide a one-fits-

PDF / 691,583 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
115 Downloads / 234 Views

DOWNLOAD

REPORT

2

Universit¨ at Leipzig, IFI/AKSW, PO 100920, 04009 Leipzig, Germany {saleem,ngongangomo}@informatik.uni-leipzig.de Insight Center for Data Analytics, National University of Ireland, Galway, Ireland [email protected]

Abstract. Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate triple stores, they mostly provide a one-ﬁts-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable to evaluate the performance of a triple store for a given application with particular requirements. We address this drawback by presenting FEASIBLE, an automatic approach for the generation of benchmarks out of the query history of applications, i.e., query logs. The generation is achieved by selecting prototypical queries of a userdeﬁned size from the input set of queries. We evaluate our approach on two query logs and show that the benchmarks it generates are accurate approximations of the input query logs. Moreover, we compare four different triple stores with benchmarks generated using our approach and show that they behave diﬀerently based on the data they contain and the types of queries posed. Our results suggest that FEASIBLE generates better sample queries than the state of the art. In addition, the better query selection and the larger set of query types used lead to triple store rankings which partly diﬀer from the rankings generated by previous works.

1

Introduction

Triple stores are the data backbone of many Linked Data applications [9]. The performance of triple stores is hence of central importance for Linked-Data-based software ranging from real-time applications [8,13] to on-the-ﬂy data integration frameworks [1,15,18]. Several benchmarks (e.g., [2,4,7,9,16,17]) for assessing the performance of the triple stores have been proposed. However, many of them (e.g., [2,4,7,17]) rely on synthetic data or on synthetic queries. The main advantage of such synthetic benchmarks is that they commonly rely on data generators that can produce benchmarks of diﬀerent data sizes and thus allow to test the scalability of triple stores. However, they often fail to reﬂect reality. In particular, previous works [5] point out that artiﬁcial benchmarks are typically highly structured while real Linked Data sources are most commonly weakly structured. c Springer International Publishing Switzerland 2015 M. Arenas et al. (Eds.): ISWC 2015, Part I, LNCS 9366, pp. 52–69, 2015. DOI: 10.1007/978-3-319-25007-6 4

FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework

53

Moreover, synthetic queries most commonly fail to reﬂect the characteristics of the real queries sent to applications [3,11]. Thus, synthetic benchmark results are rarely suﬃcient to detect the most suitable triple store for a particular real application. The DBpedia SPARQL Benchmark (DBPSB) [9] addresses a portion of these drawbacks by evaluating the performance o

Data Loading...

FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework

Recommend Documents

Benchmark

OR-Benchmark: An Open and Reconfigurable Digital Watermarking Benchmarking Framework

SPARQL with Property Paths

Recursion in SPARQL

A Submodular Optimization-Based VAE-Transformer Framework for Paraphrase Generation

Fingerprint Benchmark

Feasible region

SPARQL Query Language

XACBench: a XACML policy benchmark

Distributed graph cube generation using Spark framework

Performance Benchmark

N3X: Notation3 with SPARQL Expressions