Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads

Avatar
Poster
Voices Powered byElevenlabs logo
Connected to paperThis paper is a preprint and has not been certified by peer review

Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads

Authors

Bettisworth, B.; Psonis, N.; Poulakakis, N.; Pavlidis, P.; Stamatakis, A.

Abstract

One of the central problems facing researchers who analyze ancient DNA (aDNA) is identifying the species which corresponds to the recovered aDNA. Prior analysis of aDNA data normally uses sequence matching tools (such as BLAST) to identify reads obtained from aDNA. However, as the source of aDNA is often an previously unsampled taxon due to the taxon having gone extinct prior to the advent of modern sequencing technology, it is likely the case that there is no exact match in any database. As a consequence tools such as BLAST are of limited use in helping to place a read in a phylogenetic context, I.E. identifying the likely source of a read on a phylogenetic tree. Phylogenetic placement is a technique where a sequence or read is placed onto a specific branch phylogenetic tree. These tools offer a the potential for a much finer resolution when identifying reads. However, phylogenetic placement has primarily only been used to place reads obtained from extant sources. Phylogenetic placement\'s applicability to aDNA data is complicated by the characteristic pattern of degradation that aDNA undergoes. This characteristic damage is generally not accounted for by popular phylogenetic placement tools, and as a consequence some authors have cast doubt on the potential accuracy of such tools. To understand how the characteristic aDNA damage affects placement phylogenetic tools, implemented a statistical model of aDNA damage as a tool, which we call PyGargammel, that takes sequences applies damage characteristic of aDNA to them. We deploy PyGargammel, along with the existing phylogenetic placement assessment pipeline PEWO, to 7 empirical datasets. With this pipeline, we explore the parameter space of aDNA damage via a grid search in order to identify the factors of aDNA damage which are most impactful. We test 4 leading phlyogenetic placement tools: APPLES, EPA-NG, PPLACER, and RAPPAS. We find that the frequency of DNA backbone nicks (and consequently read length) is the primary driver of error for aDNA reads. Additionally, we find that other factors, such as the rate of A to G misincorporations, have a negligible effect on the overall accuracy of phylogenetic placement tools.

Follow Us on

0 comments

Add comment