One size does not fit all: accelerating OLAP workloads with GPUs

PDF / 1,715,250 Bytes
43 Pages / 439.37 x 666.142 pts Page_size
11 Downloads / 292 Views

One size does not fit all: accelerating OLAP workloads with GPUs Yansong Zhang, et al. [full author details at the end of the article]

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract GPU has been considered as one of the next-generation platforms for real-time query processing databases. In this paper we empirically demonstrate that the representative GPU databases [e.g., OmniSci (Open Source Analytical Database & SQL Engine, https://www.omnisci.com/platform/omniscidb, 2019)] may be slower than the representative in-memory databases [e.g., Hyper (Neumann and Leis, IEEE Data Eng Bull 37(1):3–11, 2014)] with typical OLAP workloads (with Star Schema Benchmark) even if the actual dataset size of each query can completely fit in GPU memory. Therefore, we argue that GPU database designs should not be one-size-fitsall; a general-purpose GPU database engine may not be well-suited for OLAP workloads without careful designed GPU memory assignment and GPU computing locality. In order to achieve better performance for GPU OLAP, we need to re-organize OLAP operators and re-optimize OLAP model. In particular, we propose the 3-layer OLAP model to match the heterogeneous computing platforms. The core idea is to maximize data and computing locality to specified hardware. We design the vector grouping algorithm for data-intensive workload which is proved to be assigned to CPU platform adaptive. We design the TOP-DOWN query plan tree strategy to guarantee the optimal operation in final stage and pushing the respective optimizations to the lower layers to make global optimization gains. With this strategy, we design the 3-stage processing model (OLAP acceleration engine) for hybrid CPUGPU platform, where the computing-intensive star-join stage is accelerated by GPU, and the data-intensive grouping & aggregation stage is accelerated by CPU. This design maximizes the locality of different workloads and simplifies the GPU acceleration implementation. Our experimental results show that with vector grouping and GPU accelerated star-join implementation, the OLAP acceleration engine runs 1.9×, 3.05× and 3.92× faster than Hyper, OmniSci GPU and OmniSci CPU in SSB evaluation with dataset of SF = 100. Keywords GPU · OLAP · Layered OLAP · Vector grouping · 3-layer OLAP model

13

Vol.:(0123456789)

Distributed and Parallel Databases

1 Introduction Nowadays in-memory databases are extensively adopted for high performance query processing as RAM sizes grow and prices are dropping dramatically [1]. Due to the massive parallel computing power of GPU, GPU databases [2] are considered as next-generation high performance query processing engines with large amount of CUDA cores, high bandwidth device memory and scalability, e.g., the HGX-2 [3] server can support 16 NVIDIA Tesla V100 GPU with total 0.5 TB device memory and 300 GB/s NVLink switch. The rapid developments of GPU push the developments of GPU databases. Considering the software architectures, some GPU databases are designed as GPU accelerated in-memory

Data Loading...

One size does not fit all: accelerating OLAP workloads with GPUs

Recommend Documents

Disease-specific nutrition therapy: one size does not fit all

One Size Does Not Fit All: Comparison and Results

One Explanation Does Not Fit All

Interactivity: Does One Size Fit all?

APD or CAPD: one glove does not fit all

One Size Does Not Fit All: Diabetes Prevalence Among Immigrants of the South Asian Diaspora

One Size Does Not Fit All: In Support of Psychotherapy for Gender Dysphoria

Accelerating All 5-Vertex Subgraphs Counting Using GPUs

Size Does Not Always Matter

One Size Does Not Fit All: A Closer Look at Brands in the High Net Worth Market of the South African Emerging Economy

Does one hat fit all? The case of corporate leadership structure

One score fits all: not always!