A High Performance Hierarchical Cubing Algorithm and Efficient OLAP in High-Dimensional Data Warehouse

Data cube has been playing an essential role in fast OLAP (online analytical processing) in many data warehouses. The pre-computation of data cubes is critical for improving the OLAP response time of in large high-dimensional data warehouses. However, as

  • PDF / 340,552 Bytes
  • 11 Pages / 430 x 660 pts Page_size
  • 32 Downloads / 152 Views

DOWNLOAD

REPORT


Department of Computer Science and Engineering, Yangzhou University, 225009 China 2 School of Economics & Management, Southeast University, 210096 China [email protected] Abstract. Data cube has been playing an essential role in fast OLAP (online analytical processing) in many data warehouses. The pre-computation of data cubes is critical for improving the OLAP response time of in large highdimensional data warehouses. However, as the sizes of data warehouses grow, the time it takes to perform this pre-computation becomes a significant performance bottleneck. In a high dimensional data warehouse, it might not be practical to build all these cuboids and their indices. In this paper, we propose a hierarchical cubing algorithm to partition the high dimensional data cube into low dimensional cube segments. It permits a significant reduction of CPU and I/O overhead for many queries by restricting the number of cube segments to be processed for both the fact table and bitmap indices. Experimental results show that the proposed method is significantly more efficient than other existing cubing methods. Keywords: Data cube, hierarchical cubing algorithm, high-dimensional data warehouse.

1 Introduction Data warehouses integrate massive amounts of data from multiple sources and are primarily used for decision support purposes. Data warehouses integrate massive amounts of data from multiple sources and are primarily used for decision support purposes. They have to process complex online analytical processing (OLAP). OLAP refers to the technologies that allow users to efficiently retrieve data from the data warehouse for decision support purposes[1]. A lot of research has been done in order to improve the OLAP query performance and to provide fast response times for queries on large data warehouses. A key issue to speed up the OLAP query processing is efficient indexing and materialization of data cubes [2,3,4]. Many efficient cube computation algorithms have been proposed recently, such as BUC [5], H-cubing [6], Quotient cubing [7], and Star-cubing [8]. However, in the large data warehouse applications, such as bioinformatics, the data usually has high dimensionality with more than 100 dimensions. ∗ The research in the paper is supported by the National Natural Science Foundation of China under Grant No. 70472033 and 60673060; the National Facilities and Information Infrastructure for Science and Technology of China under Grant No. 2004DKA20310; the National Tenth-Five High Technology Key Project of China under Grant No. 2003BA614A; the Natural Science Foundation of Jiangsu Province under Grant No. BK2005047 and BK2005046; the ‘Qing Lan’ Project Foundation of Jiangsu Province of China. T. Washio et al. (Eds.): PAKDD 2007 Workshops, LNAI 4819, pp. 357–367, 2007. © Springer-Verlag Berlin Heidelberg 2007

358

K. Hu et al.

Since data cube grows exponentially with the number of dimensions, it is generally too costly in both computation time and storage space to materialize a full high-dimensional data cube. For example, a data cube