On Disk Allocation of Intermediate Query Results in Parallel Database Systems

For complex queries in parallel database systems, substantial amounts of data must be redistributed between operators executed on different processing nodes. Frequently, such intermediate results cannot be held in main memory and must be stored on disk. T

  • PDF / 171,040 Bytes
  • 8 Pages / 435.239 x 665.997 pts Page_size
  • 79 Downloads / 198 Views

DOWNLOAD

REPORT


tract. For complex queries in parallel database systems, substantial amounts of data must be redistributed between operators executed on different processing nodes. Frequently, such intermediate results cannot be held in main memory and must be stored on disk. To limit the ensuing performance penalty, a data allocation must be found that supports parallel I/O to the greatest possible extent. In this paper, we propose declustering even self-contained units of temporary data processed in a single operation (such as individual buckets of parallel hash joins) across multiple disks. Using a suitable analytical model, we find that the improvement of parallel I/O outweighs the penalty of increased fragmentation.

1 Introduction In parallel database systems used for advanced applications like data warehousing, complex queries are performed on very large data sets, often in terabyte ranges. Parallel operators executed on different processing nodes exchange substantial amounts of intermediate results, and when the processors' memory capacity is exceeded, temporary data must be stored on disk. The response time problems caused by slow disk access are alleviated by parallel I/O, often using more disks than processors to avoid bottlenecks. In a shared-disk architecture, intermediate results can be written out by the sender nodes and read directly by the receivers. Such disk-based data transfer is convenient and reduces the overhead of communication between processors. But depending on the operators' access patterns, a smart disk allocation is required to limit disk contention. In most algorithms, data fragments are stored on many disks, but each fragment is kept on a single device. Thus, when a receiver processes its fragments sequentially, it can read from just one disk at a time and parallel I/O is not fully exploited. In this article, we propose declustering individual data fragments across multiple disks to increase the performance of parallel database systems for complex queries on large amounts of data. We develop an appropriate analytical model to show that the benefits of parallel I/O for the receiving operator usually outweigh the additional disk load due to increased fragmentation. Our approach works for several operators and most system architectures. Our paper is structured as follows: Sect. 2 describes the processing model of a parallel hash join in a shared-disk system, which serves as a case study throughout the text. Sect. 3 is devoted to finding the optimal degree of declustering and includes our analytical model. In Sect. 4, we outline possible extensions of our method to different operators and architectures. Related work is discussed in Sect. 5, and we conclude in Sect. 6. P. Amestoy et al. (Ed s.): Euro-Par ’99, LNCS 1685, pp. 469-476, 1999. © Springer-Verlag Berlin Heidelberg 1999

470

Holger Märtens

union of partial results

R

local join on pairs of buckets

R·1 S·1

scan fragments R11 R12 R13 scan (selection) fragmented base relations

merge node

S

R·2 S·2

R21 R22 R23

S11 S12 S13

R·3 S·3

S21 S22