FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework
Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate triple stores, they mostly provide a one-fits-
- PDF / 691,583 Bytes
- 18 Pages / 439.37 x 666.142 pts Page_size
- 115 Downloads / 222 Views
2
Universit¨ at Leipzig, IFI/AKSW, PO 100920, 04009 Leipzig, Germany {saleem,ngongangomo}@informatik.uni-leipzig.de Insight Center for Data Analytics, National University of Ireland, Galway, Ireland [email protected]
Abstract. Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate triple stores, they mostly provide a one-fits-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable to evaluate the performance of a triple store for a given application with particular requirements. We address this drawback by presenting FEASIBLE, an automatic approach for the generation of benchmarks out of the query history of applications, i.e., query logs. The generation is achieved by selecting prototypical queries of a userdefined size from the input set of queries. We evaluate our approach on two query logs and show that the benchmarks it generates are accurate approximations of the input query logs. Moreover, we compare four different triple stores with benchmarks generated using our approach and show that they behave differently based on the data they contain and the types of queries posed. Our results suggest that FEASIBLE generates better sample queries than the state of the art. In addition, the better query selection and the larger set of query types used lead to triple store rankings which partly differ from the rankings generated by previous works.
1
Introduction
Triple stores are the data backbone of many Linked Data applications [9]. The performance of triple stores is hence of central importance for Linked-Data-based software ranging from real-time applications [8,13] to on-the-fly data integration frameworks [1,15,18]. Several benchmarks (e.g., [2,4,7,9,16,17]) for assessing the performance of the triple stores have been proposed. However, many of them (e.g., [2,4,7,17]) rely on synthetic data or on synthetic queries. The main advantage of such synthetic benchmarks is that they commonly rely on data generators that can produce benchmarks of different data sizes and thus allow to test the scalability of triple stores. However, they often fail to reflect reality. In particular, previous works [5] point out that artificial benchmarks are typically highly structured while real Linked Data sources are most commonly weakly structured. c Springer International Publishing Switzerland 2015 M. Arenas et al. (Eds.): ISWC 2015, Part I, LNCS 9366, pp. 52–69, 2015. DOI: 10.1007/978-3-319-25007-6 4
FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework
53
Moreover, synthetic queries most commonly fail to reflect the characteristics of the real queries sent to applications [3,11]. Thus, synthetic benchmark results are rarely sufficient to detect the most suitable triple store for a particular real application. The DBpedia SPARQL Benchmark (DBPSB) [9] addresses a portion of these drawbacks by evaluating the performance o
Data Loading...