A Proposed Approach for Provenance Data Gathering

  • PDF / 1,197,455 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 106 Downloads / 265 Views

DOWNLOAD

REPORT


A Proposed Approach for Provenance Data Gathering Márcio José Sembay 1

&

Douglas Dyllon Jeronimo de Macedo 1

&

Moisés Lima Dutra 1

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Data provenance focuses on the source of the data and on the identification of data sources and their transformations undergone over time. This paper proposes a generic method for collecting provenance data, and is a follow-up of a study carried out by the same authors in a Brazilian hemotherapy center. This method is based on the W3C’s Provenance Data Model (PROV-DM), and proposes a way to capture, store and analyze anemia-index provenance data by applying a scientific workflow, together with the management of provenance of knowledge. This is an exploratory, practical and deductive study carried out with real data from 197,551 candidates for blood donors, extracted from reports ranging from 2000 to 2018 provided by a Brazilian hemotherapy center. People identified with high anemia rates were quantified and tagged as not-suitable for blood donations. The inadequate candidates were quantified with the highest rate of anemia, and out of 1011 male candidates and 4039 female candidates, women had the highest levels of inadequate blood donations. At the end of this study, it can be concluded that the generic method for collecting data provenance proposed here can be applied in several areas of knowledge. Keywords Data provenance . Provenance of knowledge . Scientific workflows . Anemia . Hemotherapy center

1 Introduction Science is increasingly relying on data and new technologies that have increased the efficiency of collecting this data and, consequently, the amount of data generated. Information technologies (IT) allowed the data to be processed, analyzed and stored in computational infrastructures. This scenario generated a change in scientific challenges. The old problem of data scarcity has been replaced by the difficulty in managing its excess, its variety, and distribution [1]. Data provenance is related to the different application scenarios. One of them is the healthcare scenario, the focus of this work. The use of the data provenance in the healthcare context experiences a growing scenario of research based on the most varied types of scientific experiments, and the technologies applied in this area are obtaining significant results.

* Márcio José Sembay [email protected] Douglas Dyllon Jeronimo de Macedo [email protected] Moisés Lima Dutra [email protected] 1

Department of Information Science, Federal University of Santa Catarina, Florianópolis, Brazil

In this sense, the provenance of data is important with regard to audit, triage, lineage, and data source. It can also be considered a metadata that describes the origin and the entire path taken to achieve the results of an experiment [2, 3]. Data source applications also impact data provenance and the way it is measured. Whatever the data derivation process is, it has significant implications for data quality and errors int