Revisiting hash join on graphics processors: a decade later

  • PDF / 1,770,507 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 114 Downloads / 216 Views

DOWNLOAD

REPORT


Revisiting hash join on graphics processors: a decade later Johns Paul1   · Bingsheng He2 · Shengliang Lu2 · Chiew Tong Lau1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Over the last decade, significant research effort has been put into improving the performance of hash join operation on GPUs. Over the same period, there have been significant changes to the GPU architecture. Hence in this paper, we first revisit the major GPU hash join implementations in the last decade and detail how they take advantage of different GPU architecture features. We then perform a comprehensive performance evaluation of these implementations using different generations of GPUs released over the last decade, which helps to shed light on the impact of different architecture features and to identify the factors guiding the choice of these features. We then study how data characteristics like skew and match rate impact the performance of GPU hash join implementations and propose techniques to improve the performance of existing implementations under such conditions. Finally, we perform an in-depth comparison of the performance and cost-efficiency of GPU hash join implementations against state-of-the-art CPU implementation. Keywords  GPU · Hash join · Partitioning

1 Introduction The high level of parallelism and the high memory bandwidth provided by GPUs make them attractive hardware accelerators for Online Analytical Processing [1–5]. Over the last decade, a tremendous amount of research effort has been put into accelerating relational operators like hash join using GPUs [6–15]. During this time, GPU hardware has made significant improvements through the introduction of many new architecture features. Hence, we need to revisit GPU hash join implementations in conjunction with the GPU architecture improvements, as shown in the timeline in Fig. 1. The timeline clearly shows the following trend: every 1–2 years, vendors like NVIDIA * Johns Paul [email protected] 1

Nanyang Technological University, Singapore, Singapore

2

National University of Singapore, Singapore, Singapore



13

Vol.:(0123456789)



Distributed and Parallel Databases

Fig. 1  Timeline of hash join implementations and GPU architecture features

introduce new architecture features, and 2–3 years down the line, we see new GPU hash join implementations that outperform previous implementations by taking advantage of these new features. However, existing literature lacks a systematic revisit of GPU hash join implementations from this perspective. Further, given the trend in Fig. 1, understanding the factors guiding the choice of different features is critical to future research and system designs. Hence, we revisit the hash join implementations in the last decade and study them using different generations of GPU hardware. GPUs are designed to process large amounts of data in parallel with minimal inter-thread communication. Hence, GPUs are often inefficient when processing unbalanced/skewed input relations. Other GPU hardware limitat