Cryptic endogenous retrovirus subfamilies in the primate lineage
Cryptic endogenous retrovirus subfamilies in the primate lineage
Chen, X.; Zhang, Z.; Yan, Y.; Goubert, C.; Bourque, G.; Inoue, F.
AbstractMany endogenous retroviruses (ERVs) in the human genome are primate-specific and have contributed novel cis-regulatory elements and transcripts. However, current approaches for classifying and annotating ERVs and their long terminal repeats (LTRs) have limited resolution and are inaccurate. Here, we developed a new annotation based on phylogenetic analysis and cross-species conservation. Focusing on the evolutionary young MER11A/B/C subfamilies, we revealed the presence of four phyletic groups, that better explained the epigenetic heterogeneity observed within these subfamilies, suggesting a new annotation for 412 (19.8%) of the MER11 instances. Furthermore, we functionally validated the regulatory potential of these four phyletic groups using a massively parallel reporter assay (MPRA), which also identified motifs associated with their differential activities. Combining MPRA with phyletic groups across primates revealed an apes-specific gain of SOX related motifs through a single-nucleotide deletion. Lastly, by applying our approach across 53 primate-specific LTR subfamilies, we determined the presence of 75 phyletic groups and found that 3,807 (30.0%) instances from 26 LTR subfamilies could be categorized into a novel phyletic group, many of which with a distinct epigenetic profile. Thus, with our refined annotation of primate-specific LTRs, it will be possible to better understand the evolution in primate genomes and potentially identify new roles for ERV/LTRs in their hosts.