IoT streaming data integration from multiple sources
- PDF / 1,831,157 Bytes
- 31 Pages / 439.37 x 666.142 pts Page_size
- 70 Downloads / 243 Views
IoT streaming data integration from multiple sources Doan Quang Tu1 · A. S. M. Kayes1
· Wenny Rahayu1 · Kinh Nguyen1
Received: 31 March 2019 / Accepted: 27 June 2020 © Springer-Verlag GmbH Austria, part of Springer Nature 2020
Abstract The Internet of Things (IoT) has recently received considerable interest due to the development of smart technologies in today’s interconnected world. With the rapid advancement in Internet technologies and the proliferation of IoT sensors, myriad systems and applications generate data of a massive volume, variety and velocity which traditional databases and systems are unable to manage effectively. Many organizations need to deal with these massive datasets that encounter different types of data (e.g., IoT streaming data, static data) in different formats (e.g., structured, semistructured) coming from multiple sources. Several data integration mechanisms have been designed to process mostly static data. Unfortunately, these techniques are not able to deal with and integrate IoT streaming datasets from multiple sources. In this paper, we identify the challenges of IoT Streaming Data Integration (ISDI) and present a formal approach for the real-time integrationo of such IoT streaming datasets. We address one of the important issues of timing conflict/alignment among streaming data coming from multiple sources. A generic window-based ISDI approach is proposed to deal with IoT data in different formats and algorithms are developed to integrate IoT streaming data from multiple sources. In particular, we extend the basic windowing algorithm for real-time data integration and to deal with the timing alignment issue. We also introduce a de-duplication algorithm to deal with data redundancy and to demonstrate the useful fragments of the integrated data. We conduct several sets of experiments and quantify the performance of our proposed window-based approach. In particular, we compare our local experimental results with a real setup for streaming data, using Apache Spark. The results of the experiments, which are performed on several IoT datasets, show the efficiency of our proposed solution in terms of processing time. The results are also used to provide an integrated data view to the users. Keywords IoT streaming data integration · Timing alignment · De-duplication · Window-based integration Mathematics Subject Classification 68U35 · 68-04 · 94Axx An earlier version of this paper has been presented and received a best paper award in the 33rd International Conference on Advanced Information Networking and Applications (AINA-2019) [1]. Extended author information available on the last page of the article
123
D. Q. Tu et al.
1 Introduction Due to the rapid advancement of big data platforms, the need to improve data access from multiple sources through data analysis and decision support systems has grown significantly over the last few years. However, with the unprecedented expansion of data in business, decision makers and researchers find it difficult to access the necessary data for
Data Loading...