deepTFBS: Improving within- and cross-species prediction of transcription factor binding using deep multi-task and transfer learning
deepTFBS: Improving within- and cross-species prediction of transcription factor binding using deep multi-task and transfer learning
Zhai, J.; Zhang, Y.; Zhang, C.; Yi, X.; Song, M.; Tang, C.; Ding, P.; Li, Z.; Ma, C.
AbstractThe precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, we present deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and across plant species. Taking advantages of multi-task DL and transfer learning, deepTFBS is capable of leveraging the knowledge learned from large-scale TF binding profiles to enhance the prediction of TFBSs under small-sample training and cross-species prediction tasks. When tested using available information on 359 Arabidopsis TFs, deepTFBS outperformed previously described prediction strategies, including position weight matrix, deepSEA and DanQ, with a 244.49%, 49.15%, and 23.32% improvement of the area under the precision-recall curve (PRAUC), respectively. Further cross-species prediction of TFBS in wheat showed that deepTFBS yielded a significant PRAUC improvement of 30.6% over these three baseline models. deepTFBS can also utilize information from gene conservation and binding motifs, enabling efficient TFBS prediction in species where experimental data availability is limited. A case study, focusing on the WUSCHEL (WUS) transcription factor, illustrated the potential use of deepTFBS in cross-species applications, in our example between Arabidopsis and wheat. deepTFBS is publically available at https://github.com/cma2015/deepTFBS.