A Parallel Nearest Neighbor Algorithm for Skyline Computation Using Map Reduce

The Skyline is one of the most important operator in multi-criteria decision making and can be useful for many applications such as customer information services, decision support and decision making systems. It selects the best tuples from a multi-dimens

  • PDF / 1,635,983 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 41 Downloads / 156 Views

DOWNLOAD

REPORT


2

Tic Lab, Universite Internationale de Rabat, Sala Al Jadida, Morocco [email protected], [email protected] Department of Computer Sciences, Ibn Tofail University, Kenitra, Morocco [email protected]

Abstract. The Skyline is one of the most important operator in multi-criteria decision making and can be useful for many applications such as customer information services, decision support and decision making systems. It selects the best tuples from a multi-dimensional database. The main problem of using Skyline is that the queries that compute it can be computationally expensive, so the best solution is to do it using a parallelized approach. In this paper we propose a parallel algorithm based on nearest neighbor search to compute the skyline using the Map Reduce framework. This method consist on computing in a considered region the nearest neighbor point to the origin and partitions the region where each new region is obtained by adding the constraint that the coordinate with respect to a dimension is upper bounded by that of the computed point in the same dimension. The algorithm applies the same method recursively through these regions of the partition computed at each step. We provide a parallel approach for this method based on the Map Reduce framework to come up with a solution to this problem by handling the independent regions in parallel using mappers and reducers. Keywords: Skyline

 Nearest neighbor  Map reduce  Hadoop

1 Introduction In a decision-making context, many queries of the users concern the looking for tuples for which the values of certain criteria are optimal. However the result of such queries can be empty. Indeed there may be some tuples that are useful to the user and yet eliminated from the result since they are optimal for one or some criterion but not for another. For example, considering a database of used cars (Table 1), to find the ideal car the search criteria can combine conditions on the lowest possible price, the highest possible power, the lowest possible fuel consumption and the lowest possible distance traveled. Certainly, this ideal car does not exist hence the result is null to this type of queries. Yet some cars could be relevant to the user because they are economical with a lower price but not powerful enough, they combine the criteria of minimum consumption and minimum price. In order to provide an adequate response to this type of queries, the SKYLINE operator has been introduced. It considers all the criteria for user’s preferences and © Springer Nature Switzerland AG 2019 Y. Farhaoui and L. Moussaid (Eds.): ICBDSDE 2018, SBD 53, pp. 204–214, 2019. https://doi.org/10.1007/978-3-030-12048-1_22

A Parallel Nearest Neighbor Algorithm

205

extracts the optimal tuples. Instead of looking for an ideal solution, it extracts the nearest candidates to the user’s choice. The skyline is based on the notion of dominance (A point dominates another point if it is good or better in all dimensions and better in at least one dimension). A famous method for computin