The Landscape of Stop Codon-Free Regions in Primates: A Reservoir of Proto-Genes

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

The Landscape of Stop Codon-Free Regions in Primates: A Reservoir of Proto-Genes

Authors

Soman, A. S.; Shreyasree, G.; Dwivedi, A.; Pramod, G. S.; Sakarkar, C.; Bhattacharya, D.; Vijay, N.

Abstract

Gene duplication has long been viewed as the primary source of new genes, yet growing evidence suggests that de novo emergence from non-coding DNA may be more common than previously assumed, requiring unbiased genome-wide strategies to identify its structural precursors. New protein-coding genes can arise from non-coding DNA, but the sequence features enabling this transition remain unclear. Here, we systematically identify and characterise stop-codon-free regions (SCFRs) across telomere-to-telomere assemblies of human and six other primates. Short SCFRs are abundant and widely distributed, whereas long SCFRs are rare and increasingly associated with coding overlap, moderate GC enrichment, and structured exon-intron contexts. We define exon shadows as in-frame SCFR extensions beyond annotated exon boundaries that lack stop codons, revealing latent coding-compatible sequence adjacent to established exons. We also detect introns fully spanned by single SCFRs, consistent with exitron-like architectures. Repeat composition, codon usage, and Fourier spectral analyses show that length filtering enriches for gene-like features and identifies a subset of long SCFRs with codon-scale periodicity. Together, these findings provide a framework for identifying extended ORF-like regions that may serve as substrates for de novo gene emergence in primates.

Follow Us on

0 comments

Add comment