Haplotyping a Diploid Single Individual with a Fast and Accurate Enumeration Algorithm

The minimum error correction (MEC) model is one of the important computational models for determining haplotype information from sequencing data, i.e., single individual single nucleotide polymorphism (SNP) haplotyping, haplotype reconstruction or haploty

PDF / 476,862 Bytes
13 Pages / 439.37 x 666.142 pts Page_size
13 Downloads / 163 Views

DOWNLOAD

REPORT

Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin 541004, China [email protected] 2 College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China

Abstract. The minimum error correction (MEC) model is one of the important computational models for determining haplotype information from sequencing data, i.e., single individual single nucleotide polymorphism (SNP) haplotyping, haplotype reconstruction or haplotype assembly. Due to the NP-hardness of the model, a fast and accurate enumeration algorithm is proposed for solving it. The presented algorithm reconstructs the SNP sites of a pair of haplotypes one after another. It enumerates two kinds of SNP values, i.e., (0 1)T and (1 0)T, for the SNP site being reconstructed, and chooses the one with more support coming from the SNP fragments that are covering the corresponding SNP site. The experimental comparisons were conducted among the presented algorithm, the FAHR, the Fast Hare and the DGS algorithms. The results prove that our algorithm can get higher reconstruction rate than the other three algorithms. Keywords: Sequence analysis Haplotype (MEC) Algorithm Enumeration

Minimum error correction

1 Introduction Genetic difference has received increasingly extensive attention with the development of high-throughput DNA sequencing technologies. It is well known that all human are almost identical at DNA level (*99.5 % identical). Genomic differences located in *0.5 % nucleotide bases answer for the observed diversities in our phenotypes. Among various genetic variations, single nucleotide polymorphisms (SNPs) are believed to be the most widespread form of variation, and their signiﬁcance cannot be overvalued for medical, drug-design, diagnostic and forensic applications [1]. Due to linkage disequilibrium (LD) and the absence of recombination event, adjacent SNPs tend to inherit together from ancestors to descendants. A sequence of related SNPs on each copy of a pair of chromosomes is referred to as a haplotype. Studies indicate that haplotypes generally carry more genetic information than individual SNP, and they play a crucial role in gene expression and disease association

© Springer International Publishing Switzerland 2016 D.-S. Huang et al. (Eds.): ICIC 2016, Part I, LNCS 9771, pp. 399–411, 2016. DOI: 10.1007/978-3-319-42291-6_40

400

X. Chen et al.

studies [2, 3]. Since it is both time consuming and expensive to obtain haplotypes through biological experiments directly, it is common to determine haplotypes through computational methods. Three categories of computational methods exist for computing haplotype [4]: haplotyping a population based on population genotype data [5, 6]; haplotyping a population based on sequenced fragment data [7]; haplotyping a single individual based on sequenced fragment data [8]. The third category is studied in this paper. The problem of haplotyping a single individual is also called haplotype reconstruction or haplotype

Data Loading...

Haplotyping a Diploid Single Individual with a Fast and Accurate Enumeration Algorithm

Recommend Documents

A Practical Parameterized Algorithm for the Individual Haplotyping Problem MLF

Fast and Accurate Community Search Algorithm for Attributed Graphs

Defense-VAE: A Fast and Accurate Defense Against Adversarial Attacks

Fast and Accurate Vector Operations

Fast and Accurate Multiple Sequence Alignment with MSAProbs-MPI

A New Fast Recursive Matrix Multiplication Algorithm

FA3D: Fast and Accurate 3D Object Detection

Diploid

A Fast and Efficient Topological Coding Algorithm for Compound Images

Accurate and Efficient Traffic Sign Detection with a Guided Region Enlarging Algorithm

A Polynomial-Time-Delay and Polynomial-Space Algorithm for Enumeration Problems in Multi-criteria Optimization

A fast and accurate computation of 2D and 3D generalized Laguerre moments for images analysis