AuthentiCT: a model of ancient DNA damage to estimate the proportion of present-day DNA contamination

  • PDF / 1,368,013 Bytes
  • 16 Pages / 595.276 x 793.701 pts Page_size
  • 64 Downloads / 233 Views

DOWNLOAD

REPORT


SOFTWARE

Open Access

AuthentiCT: a model of ancient DNA damage to estimate the proportion of present-day DNA contamination Stéphane Peyrégne*

and Benjamin M. Peter

* Correspondence: stephane. [email protected] Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany

Abstract Contamination from present-day DNA is a fundamental issue when studying ancient DNA from historical or archaeological material, and quantifying the amount of contamination is essential for downstream analyses. We present AuthentiCT, a command-line tool to estimate the proportion of present-day DNA contamination in ancient DNA datasets generated from single-stranded DNA libraries. The prediction is based solely on the patterns of post-mortem damage observed on ancient DNA sequences. The method has the power to quantify contamination from as few as 10,000 mapped sequences, making it particularly useful for analysing specimens that are poorly preserved or for which little data is available. Keywords: Contamination, Ancient DNA, Deamination, Damage patterns

Background After the death of an organism, its DNA decays and is progressively lost through time [1, 2]. Under favourable conditions, DNA can preserve for hundreds of thousands of years and provide valuable information about the evolutionary history of organisms [3, 4]. Yet, only minute amounts of ancient DNA (aDNA) often remain in historical or archaeological material. In addition, most of the extracted DNA usually comes from microorganisms that spread in decaying tissues [5, 6]. Whereas microbial sequences rarely align to the reference genome used for identifying endogenous sequences if appropriate length cut-offs are used [7–9], contamination with DNA from closely related organisms represents a recurrent problem [10–12]. This is particularly true for the genomic analyses of ancient humans, as the individuals handling the specimens during excavation and at later times often leave their DNA behind [13, 14]. Because this contamination can substantially affect the results of population genetic or phylogenetic analyses, quantifying the level of contamination is crucial for downstream analyses. An estimate of the level of present-day DNA contamination is also desirable for making decisions when screening samples to identify those that can be further sequenced with reasonable effort and expenses. © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not