Techniques and guidelines for effective migration from RDBMS to NoSQL

  • PDF / 1,610,742 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 17 Downloads / 223 Views

DOWNLOAD

REPORT


Techniques and guidelines for effective migration from RDBMS to NoSQL Ho-Jun Kim1 · Eun-Jeong Ko1 · Young-Ho Jeon1 · Ki-Hoon Lee1

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract Migration from RDBMS to NoSQL has become an important topic in a big data era. This paper provides comprehensive techniques and guidelines for effective migration from RDBMS to NoSQL. We discuss the challenges faced in translating SQL queries; the effects of denormalization, column families, secondary indexes, join algorithms, and column name length; and decision support for the migration. We focus on a column-oriented NoSQL, HBase because it is widely used by many Internet enterprises such as Facebook, Twitter, and LinkedIn. Because HBase does not support SQL, we use Apache Phoenix as an SQL layer on top of HBase. Experimental results using TPC-H show that column-level denormalization with atomicity and grouping columns into column families significantly improve query performance; the use of secondary indexes on foreign keys is not as effective as in RDBMSs; the query optimizer of Phoenix is not very sophisticated; shortened column names significantly reduce the database size and improve query performance; and the SVM classifier can predict whether query performance is improved by migration or not. Important open problems in NoSQL research are supporting complex SQL queries, automatic index selection, and optimizing SQL queries for NoSQL. Keywords Migration · RDBMS · NoSQL · Denormalization · Column family · Secondary index · Query optimization · Decision support

This paper is an extended version of [1] which was presented at EDB 2017.

B 1

Ki-Hoon Lee [email protected] School of Computer and Information Engineering, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Republic of Korea

123

H.-J. Kim et al.

1 Introduction NoSQL databases have become a popular alternative to traditional relational databases due to the capability of handling big data, and the demand on the migration from RDBMS to NoSQL is growing rapidly [1, 2]. Because NoSQL has different data and query model comparing with RDBMS, the migration is a challenging research problem. For example, NoSQL does not provide sufficient support for SQL queries, join operations, and ACID transactions. In this paper, we provide comprehensive techniques and guidelines for effective migration from RDBMS to NoSQL. We make three main contributions. First, we investigate the challenges faced in translating SQL queries for NoSQL. Second, we evaluate the effects of denormalization, column families, secondary indexes, join algorithms, and column name length on NoSQL databases. Third, we propose a decision support system for the migration. We focus on HBase because it is widely used by many Internet enterprises such as Facebook, Twitter, and LinkedIn. Because HBase does not support SQL, we use Apache Phoenix as an SQL layer on top of HBase. Experimental results using TPC-H show that column-level denormalization with atomicity and grouping columns in