Isabl Platform, a digital biobank for processing multimodal patient data

  • PDF / 1,860,701 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 78 Downloads / 205 Views

DOWNLOAD

REPORT


n Access

SOFTWARE

Isabl Platform, a digital biobank for processing multimodal patient data Juan S. Medina‑Martínez1,2, Juan E. Arango‑Ossa1, Max F. Levine1, Yangyu Zhou1, Gunes Gundem1, Andrew L. Kung1 and Elli Papaemmanuil1* 

*Correspondence: [email protected] 1 Memorial Sloan Kettering Cancer Center, New York, NY, USA Full list of author information is available at the end of the article

Abstract  Background:  The widespread adoption of high throughput technologies has democ‑ ratized data generation. However, data processing in accordance with best practices remains challenging and the data capital often becomes siloed. This presents an opportunity to consolidate data assets into digital biobanks—ecosystems of readily accessible, structured, and annotated datasets that can be dynamically queried and analysed. Results:  We present Isabl, a customizable plug-and-play platform for the processing of multimodal patient-centric data. Isabl’s architecture consists of a relational database (Isabl DB), a command line client (Isabl CLI), a RESTful API (Isabl API) and a frontend web application (Isabl Web). Isabl supports automated deployment of user-validated pipelines across the entire data capital. A full audit trail is maintained to secure data provenance, governance and ensuring reproducibility of findings. Conclusions:  As a digital biobank, Isabl supports continuous data utilization and automated meta analyses at scale, and serves as a catalyst for research innovation, new discoveries, and clinical translation. Keywords:  Data processing, Analysis information management system, Next generation sequencing, Genomics, Image processing, Software engineering, Multimodal data

Background Genome profiling represents a critical pillar for clinical, translational, and basic research. With an ever expanding suite of high-throughput technologies [1], the pace at which the scientific community is generating data at scale has rapidly accelerated. This imposes demands for specialized expertise to support data processing and analysis [2]. Importantly, the derivation of novel biological and clinical insights is increasingly reliant upon large and statistically powered datasets, rich metadata annotation (clinical, demographic, treatment, outcome) as well as integration of diverse data modalities generated across samples and patients (i.e. genomic, imaging) [3]. Such high-dimensional data science is now embedded across disciplines, raising significant hopes for the development of artificial intelligence (AI) driven innovation in healthcare and research [3, 4]. © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Co