Big high-dimension data cube designs for hybrid memory systems

  • PDF / 630,643 Bytes
  • 30 Pages / 439.37 x 666.142 pts Page_size
  • 102 Downloads / 275 Views

DOWNLOAD

REPORT


Big high-dimension data cube designs for hybrid memory systems Rodrigo Rocha Silva1

· Celso Massaki Hirata2 · Joubert de Castro Lima3

Received: 11 November 2019 / Revised: 4 August 2020 / Accepted: 9 August 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract In Big Data cubes with hundreds of dimensions and billions of tuples, the indexing and query operations are a challenge and the reason is the time-space exponential complexity when a full cube is computed. Therefore, solutions based on RAM may not be practical and the solutions based on hybrid memory (RAM and disk) become viable alternatives. In this paper, we propose a hybrid approach, named bCubing, to index and query high-dimension data cubes with high number of tuples in a single machine and using RAM and disk memory systems. We evaluated bCubing in terms of runtime and memory consumption, comparing it with the Frag-Cubing, HIC and H-Frag approaches. bCubing showed to be faster and used less RAM than Frag-Cubing, HIC and H-Frag. bCubing indexed and allowed to query a data cube with 1.2 billion tuples and 60 dimensions, consuming only 84 GB of RAM, which means 35% less memory than HIC. The complex holistic measures mode and median were computed in multidimensional queries, and bCubing was, on average, 50% faster than HIC. Keywords Multidimensional database · Multidimensional query · Big Data · Data cube · Holistic measure · High dimension

1 Introduction The term Big Data emerged to refer to massive volumes of data collections with the new challenge of data-intensive computing. Online Analytical Processing (OLAP) approaches that work with large volumes of data are normally relational ones and are no longer adequate to Big Data [8,10,40]. Recently, some case studies in application-independent semantic data cubes were presented [1,11,17,25,42]. The Australian experience [25] is an important and seminal initiative of public data cube services of spatial data, where satellite images capture observations of

B

Rodrigo Rocha Silva [email protected]

1

Faculdade de Tecnologia de São Paulo, Universidade de Coimbra, Rua Carlos Barattino, 908 Vila Nova Mogilar, 08773-600 Mogi das Cruzes, SP, Brazil

2

Instituto Tecnológico de Aeronáutica, São José dos Campos, SP, Brazil

3

Universidade Federal de Ouro Preto, Ouro Prêto, SP, Brazil

123

R. R. Silva

diverse physical phenomena that can be used by different application domains. These applications can perform analyses over the entire globe and for over long periods of time (decades). Data cubes and OLAP technology are being redesigned for the Big Data needs, but the dimension growth, measure types diversity, especially the holistic ones, hierarchy types diversity, including spatial, stream and graph hierarchies, update types (structural and content updates) and data volume continue to be open issues in OLAP and data cube literature [8,9]. In terms of sequential approaches, Frag-Cubing [27], a RAM only-based solution, can be considered the seminal solution for high-dimension data