DARDN: Identifying transcription factor binding motifs from long DNA sequences using multi-CNNs and DeepLIFT

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

DARDN: Identifying transcription factor binding motifs from long DNA sequences using multi-CNNs and DeepLIFT

Authors

Cho, H. J.; Wang, Z.; Cong, Y.; Bekiranov, S.; Zhang, A.; Zang, C.

Abstract

Motivation Characterization of regulatory elements in DNA sequence is a key task in functional genomics. CTCF exhibits specific binding patterns in the genome of cancer cells and has a non-canonical function to facilitate oncogenic transcription program by cooperating with transcription factors bound at flanking distal regions. Identification of sequence motifs from a broad genomic region surrounding cancer-specific CTCF binding sites can help find active transcription factors in a cancer type. However, the long DNA sequences without localization information makes it difficult to perform conventional motif enrichment analysis. Results We present DNAResDualNet (DARDN), a computational method that utilizes convolutional neural networks (CNNs) coupled with feature discovery using DeepLIFT, for identifying DNA sequence features that can differentiate two sets of lengthy DNA sequences. Evaluation on DNA sequences associated with CTCF binding sites in T-cell acute lymphoblastic leukemia (T-ALL) and other cancer types demonstrates DARDN\'s ability in classifying DNA sequences surrounding cancer-specific CTCF binding from control constitutive CTCF binding and identifying sequence motifs for transcription factors potentially active in each specific cancer type. We identified motifs for potential oncogenic transcription factors in T-ALL, acute myeloid leukemia (AML), breast cancer (BRCA), colorectal cancer (CRC), lung adenocarcinoma (LUAD), and prostate cancer (PRAD). Our work demonstrates the power of advanced machine learning and feature discovery approach in finding biologically meaningful information from complex DNA sequence data.

Follow Us on

0 comments

Add comment