An SQL-Based Query Language and Engine for Graph Pattern Matching

The interest for graph databases has increased in the recent years. Several variants of graph query languages exist – from low-level programming interfaces to high-level, declarative languages. In this paper, we describe a novel SQL-based language for mod

  • PDF / 1,093,990 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 107 Downloads / 217 Views

DOWNLOAD

REPORT


SAP SE, Potsdam, Germany {christian.krause01,daniel.johannsen,radwan.deeb, david.knacker,anton.niadzelka}@sap.com 2 Technische Universit¨ at Ilmenau, Ilmenau, Germany [email protected]

Abstract. The interest for graph databases has increased in the recent years. Several variants of graph query languages exist – from lowlevel programming interfaces to high-level, declarative languages. In this paper, we describe a novel SQL-based language for modeling high-level graph queries. Our approach is based on graph pattern matching concepts, specifically nested graph conditions with distance constraints, as well as graph algorithms for calculating nested projections, shortest paths and connected components. Extending SQL with graph concepts enables the reuse of syntax elements for arithmetic expressions, aggregates, sorting and limits, and the combination of graph and relational queries. We evaluate the language concepts and our experimental SAP HANA Graph Scale-Out Extension (GSE) prototype (This paper is not official SAP communication material. It discusses a research-only prototype, not an existing or future SAP product. Any business decisions made concerning SAP products should be based on official SAP communication material.) using the LDBC Social Network Benchmark. In this work we consider only complex read-only queries, but the presented language paves the way for a SQL-based graph manipulation language formally based on graph transformations.

1

Introduction

In contrast to relational database management systems, graph databases employ dedicated data structures and algorithms tailored for analytical and transactional graph processing. Current applications in the domains of social network analysis (e.g., Facebook, LinkedIn), business network analysis (e.g., Ebay, SAP Ariba) and knowledge graphs (e.g., Google Search, Microsoft Office) show that there is a high demand for efficient reasoning on large-scale graph data. There exist a number of low-level graph programming models, the most prominent being Bulk Synchronous Parallel [16]. However, in the context of enterprise applications there is a need for high-level, declarative graph query languages that enable complex analysis scenarios. While SQL is the accepted standard query c Springer International Publishing Switzerland 2016  R. Echahed and M. Minas (Eds.): ICGT 2016, LNCS 9761, pp. 153–169, 2016. DOI: 10.1007/978-3-319-40530-8 10

154

C. Krause et al.

language in the world of relational databases, there currently is no consensus on a standard for a general-purpose graph query language. OpenCypher [10] is an initiative by the inventors of the Neo4j graph database to define a common graph query language based on their Cypher language. For historical reasons, many companies today use Cypher. However, the language in its current form is ad hoc and lacks tool support by other vendors. SPARQL [18] is the standard query language in the semantic web domain. While it supports graph query concepts, it is geared into the triple store (subject-predicate-object) concept of the Res