ART: sub-logarithmic decentralized range query processing with probabilistic guarantees
- PDF / 1,896,765 Bytes
- 39 Pages / 439.37 x 666.142 pts Page_size
- 91 Downloads / 235 Views
ART: sub-logarithmic decentralized range query processing with probabilistic guarantees S. Sioutas · P. Triantafillou · G. Papaloukopoulos · E. Sakkopoulos · K. Tsichlas · Y. Manolopoulos
Published online: 3 October 2012 © Springer Science+Business Media New York 2012
Abstract We focus on range query processing on large-scale, typically distributed infrastructures, such as clouds of thousands of nodes of shared-datacenters, of p2p distributed overlays, etc. In such distributed environments, efficient range query processing is the key for managing the distributed data sets per se, and for monitoring
Communicated by: Beng Chin Ooi. A limited and preliminary version of this work has been presented as brief announcement in Twenty-Ninth Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Zurich, Switzerland July 25–28, 2010 [26]. This work was partially supported by Thales Project entitled “Cloud9: A multidisciplinary, holistic approach to internet-scale cloud computing”. For more details see the following URL: https://sites.google.com/site/thaliscloud9/home. S. Sioutas () Department of Informatics, Ionian University, Corfu, Greece e-mail: [email protected] P. Triantafillou · G. Papaloukopoulos · E. Sakkopoulos CTI and Dept. of Computer Engineering & Informatics, University of Patras, Patras, Greece P. Triantafillou e-mail: [email protected] G. Papaloukopoulos e-mail: [email protected] E. Sakkopoulos e-mail: [email protected] K. Tsichlas · Y. Manolopoulos Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece K. Tsichlas e-mail: [email protected] Y. Manolopoulos e-mail: [email protected]
72
Distrib Parallel Databases (2013) 31:71–109
the infrastructure’s resources. We wish to develop an architecture that can support range queries in such large-scale decentralized environments and can scale in terms of the number of nodes as well as in terms of the data items stored. Of course, in the last few years there have been a number of solutions (mostly from researchers in the p2p domain) for designing such large-scale systems. However, these are inadequate for our purposes, since at the envisaged scales the classic logarithmic complexity (for point queries) is still too expensive while for range queries it is even more disappointing. In this paper we go one step further and achieve a sub-logarithmic complexity. We contribute the ART (Autonomous Range Tree) structure, which outperforms the most popular decentralized structures, including Chord (and some of its successors), BATON (and its successor) and Skip-Graphs. We contribute theoretical analysis, backed up by detailed experimental results, showing that the communication cost of query and update operations is O(log2b log N ) hops, where the base b is a double-exponentially power of two and N is the total number of nodes. Moreover, ART is a fully dynamic and fault-tolerant structure, which supports the join/leave node operations in O(log log N ) expected w.h.p. number of hops. Our experimental p
Data Loading...