Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation grap

  • PDF / 2,819,840 Bytes
  • 18 Pages / 595 x 794 pts Page_size
  • 52 Downloads / 196 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph Rui Martiniano1† , Erik Garrison2,3† , Eppie R. Jones4 , Andrea Manica4 and Richard Durbin1,2* *Correspondence: [email protected] † Rui Martiniano and Erik Garrison contributed equally to this work. 1 Department of Genetics, University of Cambridge, Cambridge, CB3 0DH UK 2 Wellcome Sanger Institute, Cambridge, CB10 1SA UK Full list of author information is available at the end of the article

Abstract Background: During the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA molecules are short and frequently mutated by post-mortem chemical modifications. These features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Alternative approaches have been developed to replace the linear reference with a variation graph which includes known alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for aDNA and compare with existing methods. Results: We use vg to align simulated and real aDNA samples to a variation graph containing 1000 Genome Project variants and compare with the same data aligned with bwa to the human linear reference genome. Using vg leads to a balanced allelic representation at polymorphic sites, effectively removing reference bias, and more sensitive variant detection in comparison with bwa, especially for insertions and deletions (indels). Alternative approaches that use relaxed bwa parameter settings or filter bwa alignments can also reduce bias but can have lower sensitivity than vg, particularly for indels. Conclusions: Our findings demonstrate that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analyzing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed. Keywords: Ancient DNA, Variation graph, Sequence alignment, Reference bias

© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the cop