Privacy-Preserving Processing of Raw Genomic Data

Geneticists prefer to store patients’ aligned, raw genomic data, in addition to their variant calls (compact and summarized form of the raw data), mainly because of the immaturity of bioinformatic algorithms and sequencing platforms. Thus, we propose a pr

PDF / 1,033,295 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
98 Downloads / 285 Views

DOWNLOAD

REPORT

´ Ecole Polytechnique F´ed´erale de Lausanne, Lausanne, Switzerland [email protected] 2 University of Waterloo, Waterloo, Canada 3 Sophia Genetics, Lausanne, Switzerland

Abstract. Geneticists prefer to store patients’ aligned, raw genomic data, in addition to their variant calls (compact and summarized form of the raw data), mainly because of the immaturity of bioinformatic algorithms and sequencing platforms. Thus, we propose a privacy-preserving system to protect the privacy of aligned, raw genomic data. The raw genomic data of a patient includes millions of short reads, each comprised of between 100 and 400 nucleotides (genomic letters). We propose storing these short reads at a biobank in encrypted form. The proposed scheme enables a medical unit (e.g., a pharmaceutical company or a hospital) to privately retrieve a subset of the short reads of the patients (which include a deﬁnite range of nucleotides depending on the type of the genetic test) without revealing the nature of the genetic test to the biobank. Furthermore, the proposed scheme lets the biobank mask particular parts of the retrieved short reads if (i) some parts of the provided short reads are out of the requested range, or (ii) the patient does not give consent to some parts of the provided short reads (e.g., parts revealing sensitive diseases). We evaluate the proposed scheme to show the amount of unauthorized genomic data leakage it prevents. Finally, we implement the proposed scheme and assess its practicality.

Keywords: Genomics

1

· Privacy · Bioinformatics · Raw genomic data

Introduction

Genomics holds great promise for better predictive medicine and improved diagnoses. However, genomics also comes with a risk to privacy [4] (e.g., revelation of an individual’s genetic properties due to the leakage of his genomic data). An increasing number of medical units (pharmaceutical companies or hospitals) are willing to outsource the storage of genomes generated in clinical trials. Acting as a third party, a biobank could store patients’ genomic data that would be used by the medical units for clinical trials. In the meantime, the patient can also beneﬁt from the stored genomic information by interrogating his own genomic J. Garcia-Alfaro et al. (Eds.): DPM 2013 and SETOP 2013, LNCS 8247, pp. 133–147, 2014. c Springer-Verlag Berlin Heidelberg 2014 DOI: 10.1007/978-3-642-54568-9 9,

134

E. Ayday et al.

data, together with his family doctor, for speciﬁc genetic predispositions, susceptibilities and metabolical capacities. The major challenge here is to preserve the privacy of patients’ genomic data while allowing the medical units to operate on speciﬁc parts of the genome (for which they are authorized). We can put the research on genomic privacy in three main categories: (i) reidentiﬁcation of anonymized genomic data [12,13,17,18], (ii) cryptographic algorithms to protect genomic data [6–9,14,16], and (iii) private clinical genomics [11]. To the best of our knowledge, none of the existing works on genomic privacy addresses the issue of pri

Data Loading...

Privacy-Preserving Processing of Raw Genomic Data

Recommend Documents

Analysis of Processing Pipelines in Digital Raw Cameras

Data processing

Mineral Fillers in Thermoplastics I Raw Materials and Processing

Extracting Backbone Structure of a Road Network from Raw Data

Sharing genomic data from clinical testing with researchers: public survey of expectations of clinical genomic data mana

Quantum Data Processing

WBTC: a new approach for efficient storage of genomic data

Multi-omics-data-assisted genomic feature markers preselection improves the accuracy of genomic prediction

Big Data Processing

Data Stream Processing

Data Processing in AWS

Parallel Data Processing