The mouse pangenome reveals the structural complexity of the murine protein coding landscape

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

The mouse pangenome reveals the structural complexity of the murine protein coding landscape

Authors

Helmy, M.; Li, J. U.; Yan, X. F.; Meade, R. K.; Anderson, E.; Chen, P.; Czechanski, A. M.; Di Domenico, T.; Flint, J.; Garrison, E.; Guarracino, A.; Gontijo, M. T.; Haggerty, L.; Heard, E.; Howe, K.; Meena, N.; Martin, F. J.; Miska, E.; Rall, I.; Ramakrishna, N. B.; Sapetschnig, A.; Sinha, S.; Sun, D.; Tricomi, F. F.; Qu, R.; Wood, J. M.; Wu, T.; Zhou, D. J.; Reinholdt, L.; Adams, D. J.; Smith, C. M.; Lilue, J.; Keane, T. M.

Abstract

We present the first mouse pangenome consisting of 17 high-quality inbred mouse strain genomes with complete annotation. This collection includes 12 widely used classical laboratory strains and 5 wild-derived strains. We have fully resolved previously incomplete genomic regions, including the major histocompatibility complex (MHC), the defensin cluster, T-cell receptor, and Ly49 complexes. Hundreds of non-reference genes identified in previous publications not found in GRCm39, like Defa1, Raet1a, and Klra20 (Ly49T), were localised in the new reference genomes. We conducted the first genome-wide scan of variable number tandem repeats (VNTRs) within the coding regions of mice, identifying over 400 genes with VNTR polymorphisms up to more than 600 repeat copies and repeat units reaching 990 nucleotides. Our strain-specific annotations enhance RNA-Seq analyses, as demonstrated in PWK/PhJ, where we observed a 5.1% improvement in read mapping and expression level differences in 2.1% of coding genes compared to using GRCm39.

Follow Us on

0 comments

Add comment