A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model
- PDF / 1,815,939 Bytes
- 12 Pages / 595 x 791 pts Page_size
- 91 Downloads / 152 Views
METHODOLOGY
Open Access
A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model Jiaqi Liu1,2† , Jiayin Wang1,2*† , Xiao Xiao1,3† , Xin Lai1,2 , Daocheng Dai1,2 , Xuanping Zhang1,2 , Xiaoyan Zhu1,2 , Zhongmeng Zhao1,2 , Juan Wang1,4 and Zhimin Li1,4* From The 18th Asia Pacific Bioinformatics Conference Seoul, Korea. 18-20 August 2020
Abstract Background: The emergence of the third generation sequencing technology, featuring longer read lengths, has demonstrated great advancement compared to the next generation sequencing technology and greatly promoted the biological research. However, the third generation sequencing data has a high level of the sequencing error rates, which inevitably affects the downstream analysis. Although the issue of sequencing error has been improving these years, large amounts of data were produced at high sequencing errors, and huge waste will be caused if they are discarded. Thus, the error correction for the third generation sequencing data is especially important. The existing error correction methods have poor performances at heterozygous sites, which are ubiquitous in diploid and polyploidy organisms. Therefore, it is a lack of error correction algorithms for the heterozygous loci, especially at low coverages. Results: In this article, we propose a error correction method, named QIHC. QIHC is a hybrid correction method, which needs both the next generation and third generation sequencing data. QIHC greatly enhances the sensitivity of identifying the heterozygous sites from sequencing errors, which leads to a high accuracy on error correction. To achieve this, QIHC established a set of probabilistic models based on Bayesian classifier, to estimate the heterozygosity of a site and makes a judgment by calculating the posterior probabilities. The proposed method is consisted of three modules, which respectively generates a pseudo reference sequence, obtains the read alignments, estimates the heterozygosity the sites and corrects the read harboring them. The last module is the core module of QIHC, which is designed to fit for the calculations of multiple cases at a heterozygous site. The other two modules enable the reads mapping to the pseudo reference sequence which somehow overcomes the inefficiency of multiple mappings that adopt by the existing error correction methods. (Continued on next page)
*Correspondence: [email protected]; [email protected] † Jiaqi Liu, Jiayin Wang and Xiao Xiao contributed equally to this work. School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710048, China 4 Annoroad Gene Institute, Beijing 100176, China Full list of author information is available at the end of the article 1
© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, p
Data Loading...