CFC-seq: identification of full-length capped RNAs unveil enhancer-derived transcription

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

CFC-seq: identification of full-length capped RNAs unveil enhancer-derived transcription

Authors

Yip, C. W.; Parr, C.; Takahashi, H.; Yasuzawa, K.; Valentine, M.; Nishiyori-Sueki, H.; Ugolini, C.; Ranzani, V.; Murata, M.; Kato, M.; Kang, W.; Yip, W. H.; Shibayama, Y.; Sim, A. D.; Chen, Y.; Shu, X.; Moody, J. D.; Umarov, R.; Chang, J.-C.; Pandolfini, L.; Kawashima, T.; Tagami, M.; Nobusada, T.; Kouno, T.; Alfonso Gonzalez, C.; Albanese, R.; Dossena, F.; Haberman, N.; Ozaki, K.; Kasukawa, T.; Lenhard, B.; Frith, M.; Bodega, B.; Nicassio, F.; Calviello, L.; Bienko, M.; Legnini, I.; Hilgers, V.; Gustincich, S.; Goeke, J.; Lecellier, C. H.; Shin, J. W.; Hon, C.-C.; Carninci, P.

Abstract

Long-read sequencing has emerged as a powerful tool for uncovering novel transcripts and genes. However, existing protocols often lack confidence in identifying the transcription start site (TSS) and fail to capture non-poly(A) RNA, thereby limiting the discovery of novel genes, particularly long non-coding RNAs (lncRNAs). In this study, we introduce Cap-trap full-length cDNA sequencing (CFC-seq), a comprehensive protocol that combines Cap-trapping and poly(A)-tailing with Oxford Nanopore sequencing. This protocol enables precise identification of TSSs and full-length transcripts. Applying CFC-seq to two in vitro differentiation time courses resulted in approximately 236 million mappable reads. The transcript Start-site Aware Long-read Assembler (SALA) was developed for de novo assembling the transcript models, leading to the identification of 39,425 confident novel genes. Using this dataset, enhancer-derived ncRNAs were re-defined with longer length and more splicing activity, which were correlated with enhancer structure. Compared to enhancers with CpG islands, TATA box enhancers were shown to be more cell-type specific with fewer chromatin interaction but produced longer and more stable polyadenylated RNA. A significant proportion of these TATA box-derived eRNAs originated from LTR transposable elements. Overall, this study systematically annotated ~24,000 novel eRNA genes and correlated their transcription properties with enhancer structure.

Follow Us on

0 comments

Add comment