Big data aggregation in the case of heterogeneity: a feasibility study for digital health
- PDF / 1,264,045 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 10 Downloads / 193 Views
ORIGINAL ARTICLE
Big data aggregation in the case of heterogeneity: a feasibility study for digital health Alex Adim Obinikpo1 · Burak Kantarci1 Received: 5 March 2018 / Accepted: 14 December 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract In big data applications, an important factor that may affect the value of the acquired data is the missing data, which arises when data is lost either during acquisition or during storage. The former can be a result of faulty acquisition devices or non responsive sensors whereas the latter can occur as a result of hardware failures at the storage units. In this paper, we consider human activity recognition as a case study of a typical machine learning application on big datasets. We conduct a comprehensive feasibility study on the fusion of sensory data that is acquired from heterogeneous sources. We present insights on the aggregation of heterogeneous datasets with minimal missing data values for future use. Our experiments on the accuracy, F-1 score, and PPV of various key machine learning algorithms show that sensory data acquired by wearables are less vulnerable to missing data and smaller training sets whereas smart portable devices require larger training sets to reduce the impacts of possibly missing data. Keywords Dedicated sensors · Non-dedicated sensors · Aggregation
1 Introduction With the phenomenal advent of the Big Data phenomenon in smart environments, the ultimate goal of integrating big data analytics methodologies with smart services is to ensure the quality of service for the end users. Ensuring the service quality can be achieved by the development and integration of effective and efficient data acquisition techniques, as well as the application of improved methodologies on the acquired data [1, 2]. Among the smart environments that have been thriving in the Big Data Era, digital health (D-Health) is becoming robust and more practical due to the proliferation of the big data space with various data acquisition devices like smart phones and wearables [3, 4]. While design and implementation of D-Health systems take benefit of big data analytics [5–7], services that are offered through these systems are various; digital patients assistant, automated feedback systems are just to mention a few [8]. Figure 1 illustrates a broad overview of big data and its applications in D-Health as a layered model. The first layer
* Burak Kantarci [email protected] 1
University of Ottawa, Ottawa, Canada
contains data sources such as wearables or smart devices, and acquisition methods for collecting sensory data in various formats. Besides being large in volume, the acquired data can be in various format and even unstructured. The second layer contains processing modules for acquired data. These include selection of appropriate features, trimming the dataset, aggregation of multi-sensory data and transformation of the aggregated data to the ready-to-analyze format. The third layer, namely the data analytics layer, takes the processed d
Data Loading...