LOCATE: using Long-read to Characterize All Transposable Elements
LOCATE: using Long-read to Characterize All Transposable Elements
Yu, T.; Hu, Z.; Xu, B.; Zhang, X.; Zhang, X.; Weng, Z.
AbstractTransposons constitute ~45% of the human genome, driving gene evolution and contributing to disease, but their repetitive nature complicates the identification of new insertions. We present LOCATE (Long-read to Characterize All Transposable Elements), an algorithm using long-read sequencing to detect and assemble transposon insertions. LOCATE outperforms existing tools on simulated datasets and achieves the best performance in two previous benchmarks, as well as in a new benchmark we constructed using real biological datasets. Applying LOCATE to public datasets revealed that pre-existing Alu copies create two hotspots for Alu and LINE1 insertions: the A-rich linker and the poly(A) tail. We further observed a preference for self-insertions over non-self-insertions in Alu and LINE1, suggesting a "feedforward" transposition mechanism in which Alu and LINE1 RNA transcripts target the hotspots of their source copies to generate new insertions. LOCATE enhances our ability to study transposons and their role in genome dynamics.