Privacy-Preserving Processing of Raw Genomic Data

Geneticists prefer to store patients’ aligned, raw genomic data, in addition to their variant calls (compact and summarized form of the raw data), mainly because of the immaturity of bioinformatic algorithms and sequencing platforms. Thus, we propose a pr

  • PDF / 1,033,295 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 98 Downloads / 179 Views

DOWNLOAD

REPORT


´ Ecole Polytechnique F´ed´erale de Lausanne, Lausanne, Switzerland [email protected] 2 University of Waterloo, Waterloo, Canada 3 Sophia Genetics, Lausanne, Switzerland

Abstract. Geneticists prefer to store patients’ aligned, raw genomic data, in addition to their variant calls (compact and summarized form of the raw data), mainly because of the immaturity of bioinformatic algorithms and sequencing platforms. Thus, we propose a privacy-preserving system to protect the privacy of aligned, raw genomic data. The raw genomic data of a patient includes millions of short reads, each comprised of between 100 and 400 nucleotides (genomic letters). We propose storing these short reads at a biobank in encrypted form. The proposed scheme enables a medical unit (e.g., a pharmaceutical company or a hospital) to privately retrieve a subset of the short reads of the patients (which include a definite range of nucleotides depending on the type of the genetic test) without revealing the nature of the genetic test to the biobank. Furthermore, the proposed scheme lets the biobank mask particular parts of the retrieved short reads if (i) some parts of the provided short reads are out of the requested range, or (ii) the patient does not give consent to some parts of the provided short reads (e.g., parts revealing sensitive diseases). We evaluate the proposed scheme to show the amount of unauthorized genomic data leakage it prevents. Finally, we implement the proposed scheme and assess its practicality.

Keywords: Genomics

1

· Privacy · Bioinformatics · Raw genomic data

Introduction

Genomics holds great promise for better predictive medicine and improved diagnoses. However, genomics also comes with a risk to privacy [4] (e.g., revelation of an individual’s genetic properties due to the leakage of his genomic data). An increasing number of medical units (pharmaceutical companies or hospitals) are willing to outsource the storage of genomes generated in clinical trials. Acting as a third party, a biobank could store patients’ genomic data that would be used by the medical units for clinical trials. In the meantime, the patient can also benefit from the stored genomic information by interrogating his own genomic J. Garcia-Alfaro et al. (Eds.): DPM 2013 and SETOP 2013, LNCS 8247, pp. 133–147, 2014. c Springer-Verlag Berlin Heidelberg 2014 DOI: 10.1007/978-3-642-54568-9 9, 

134

E. Ayday et al.

data, together with his family doctor, for specific genetic predispositions, susceptibilities and metabolical capacities. The major challenge here is to preserve the privacy of patients’ genomic data while allowing the medical units to operate on specific parts of the genome (for which they are authorized). We can put the research on genomic privacy in three main categories: (i) reidentification of anonymized genomic data [12,13,17,18], (ii) cryptographic algorithms to protect genomic data [6–9,14,16], and (iii) private clinical genomics [11]. To the best of our knowledge, none of the existing works on genomic privacy addresses the issue of pri