Using T-Drive and BerlinMod in Parallel SECONDO for Performance Evaluation of Geospatial Big Data Processing

With the growing volume of geographically referenced data, there is a need to develop new approaches for efficient storage and processing of massive spatial data. The increase in spatial data has been fueled by the growing availability of ubiquitous mobil

  • PDF / 374,026 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 89 Downloads / 184 Views

DOWNLOAD

REPORT


Introduction With the growing availability of ubiquitous mobile computing devices such as smart phones equipped with GPS, the amount of mobility data is increasing. Typically mobility data contains both spatial and temporal data which form trajectories representing a time-stamped path of an object through space. Movements of these objects contain hidden patterns which reflect the behavior of these entities. Spatio-temporal queries are commonly used to identify such patterns. While there are several DBMS which provide support for spatial operators, only few specialized ones provide support for both spatial and temporal data processing. Secondo (Guting et al. 2005) and Hermes (Pelekis et al. 2006) are two examples. The nature

M. Ashfaq  A. Tahir (&) Institute of Geographical Information Systems, National University of Sciences and Technology, Islamabad, Pakistan e-mail: [email protected] M. Ashfaq e-mail: [email protected] F.M. Orakzai Department of Computer Science, Aalborg University, Aalborg, Denmark e-mail: [email protected] G. McArdle  M. Bertolotto School of Computer Science and Earth Institute, University College Dublin, Dublin 4, Ireland e-mail: [email protected] M. Bertolotto e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2017 C. Zhou et al. (eds.), Spatial Data Handling in Big Data Era, Advances in Geographic Information Science, DOI 10.1007/978-981-10-4424-3_1

3

4

M. Ashfaq et al.

of movement data means that its size can become very larger, and processing and querying them become slow and inefficient. To handle such instances, there are moving object database platforms which support parallel query processing. Handling big data requires high performance computing or distributed data processing. The state-of-the-art industrial standard is the MapReduce model (Dean and Ghemawat 2008). The framework of Apache Hadoop (Murthy et al. 2011) is its open-source implementation. The original aim of the MapReduce paradigm was to process simple text documents. However the implementation of complex algorithms and the management of heterogeneous data structures was a challenging task. To counteract this, several extensions and toolkits have been introduced that operate over the Hadoop platform enabling a wide range of data management, mining and analysis possibilities. Parallel Secondo, a Hadoop based platform is a promising tool to handle big mobility data. It combines the distributed processing ability of Hadoop and the useful analytical capabilities of Secondo to store and process trajectories. Parallel Secondo provides hybrid processing where analysis can be run on both sequential and parallel modes depending on the available distributed architecture. Data size also drives the necessity to run queries in sequential or parallel mode. Processing queries using more than one node may increase time efficiency; however the extent of efficiency increase includes many factors such as volume of data, nature of query and number of nodes. This paper proposes an appropriate environme