New gene annotations for three pea aphid genome assemblies allow comparative analyses of genes and gene family evolution
New gene annotations for three pea aphid genome assemblies allow comparative analyses of genes and gene family evolution
Deem, K. D.; Brisson, J. A.
AbstractReliable genome annotation is crucial for analyses of gene function, conservation, and evolution. Factors such as the sequencing technology used to create the assembly and the amount of duplicated sequence within the genome of interest can have a large impact on the quality of gene annotations. In particular, short read-based assemblies tend to mis-assemble duplicated genes as single loci, a problem that requires additional long read sequencing to resolve. Pea aphids exhibit a high level of gene duplication, resulting in mis-assembly and mis-annotation of genes in the short read reference genome. Here, we re-annotate the pea aphid reference genome, along with two long read pea aphid genomes, to facilitate future analyses of gene duplication and function in pea aphids. We use an integrated approach, consolidating both ab initio and RNAseq-based annotations into unified gene models. The new annotations contain genes that were missing, mis-annotated, or mis-assembled in the reference, and are generally consistent across assemblies, showing very good agreement between the long read assemblies. Our annotation method is sensitive enough to refine existing gene models, uncovering alternatively used promoters and isoforms, and aids in finding gene duplications. These data provide a useful supplement to the existing reference annotations and a new comparative framework for discovery and analysis of gene function and duplication in this important emerging model insect.