Data Analysis in Rare Disease Diagnostics

PDF / 2,375,171 Bytes
19 Pages / 595.276 x 790.866 pts Page_size
48 Downloads / 327 Views

REVIEW ARTICLE

© Indian Institute of Science 2020.

Data Analysis in Rare Disease Diagnostics

Vamsi Veeramachaneni* Abstract | There are more than 8000 documented rare diseases in the world. While each disease is rare in itself, it is estimated that 1 in every 15 or 20 persons is affected by some rare disease. Most rare diseases are caused by just one or two small changes in the genome. Identifying the causative variant from the millions of variants that differentiate one person’s genome from another is a challenging task. In this article, we provide an overview of the data processing that takes place during the multi-stage rare disease diagnosis process. At each stage, we describe algorithms and methods that are in use in diagnostic laboratories and also describe how machine learning in general and deep learning in particular are improving the process. 1 Introduction A draft human genome covering ~ 95% of the human genome was first released in 2 0001. The sequence, commonly referred to as the human reference genome sequence, is a composite sequence created by sequencing and painstakingly assembling DNA obtained from anonymous volunteers of diverse backgrounds. This ~ 3 billion nucleotide-long genome sequence has undergone several revisions over the years and there are still small regions that have remained intractable. It is not an exaggeration to state that all clinical genomics applications today use the reference sequence as the basis for analysis. In this article, we focus on the topic of rare disease diagnosis through sequencing. There are over 8600 rare disease phenotypes documented in OMIM t oday2. The molecular basis for 6200 of these diseases has been traced to 3900 genes in the reference genome. Most rare diseases are caused by just one or two variants present in the patient genome. However, identifying the exact variants from among the more than 5 million small variants that distinguish any individual from the reference genome is an extremely challenging task3. There are four major steps in the rare disease diagnosis process—sequencing, variant detection, variant assessment, and variant prioritization. In this article, we take you through these steps explaining the data analysis that happens at each

J. Indian Inst. Sci. | VOL xxx:x | xxx–xxx 2020 | journal.iisc.ernet.in

step. We also describe emerging machine learning solutions that could have a bearing on how the step is carried out in a diagnostic setting in the future. 2 Sequencing Sequencing is the process of analyzing the DNA extracted from a sample and generating the nucleotide sequence that corresponds to it. In clinical genomics, the main goal of sequencing is to use the sequences to identify how the sample differs from the reference genome. We use the term variants to describe these differences. The implicit assumption is that some of the variants may help explain the cause for a disease or provide clues on the right treatment for the patient. Variants can broadly be classified into four categories. • Substitution where a single base i

Data Loading...

Data Analysis in Rare Disease Diagnostics

Recommend Documents

Improving the analysis of composite endpoints in rare disease trials

Vibrational Spectroscopy: Disease Diagnostics and Beyond

Circulating microRNAs in Disease Diagnostics and their Potential Biological Relevance

Current Trends in Plant Disease Diagnostics and Management Practices

Participation in patient support forums may put rare disease patient data at risk of re-identification

Analysis of Rare Categories

Alpha-1 Antitrypsin Deficiency: a Rare Disease?

Multicentric Reticulohistiocytosis: a Rare Yet Challenging Disease

Jaundice at the Onset: A Rare Event in Kawasaki Disease

Big Data Analysis and Genetic Liability to Neuropsychiatric Disease

Immunohistochemistry in Tumor Diagnostics

Data Analysis in Cosmology