RNA-ligand interaction scoring via data perturbation and augmentation modeling

Avatar
Poster
Voices Powered byElevenlabs logo
Connected to paperThis paper is a preprint and has not been certified by peer review

RNA-ligand interaction scoring via data perturbation and augmentation modeling

Authors

Ma, H.; Gao, L.; Jin, Y.; Bai, Y.; Liu, X.; Bao, P.; Liu, K.; Xu, Z. Z.; Lu, Z. J.

Abstract

RNA-targeting drug discovery is undergoing an unprecedented revolution. Despite recent advances in this field, developing data-driven deep learning models remains challenging due to the limited availability of validated RNA-small molecule interactions and the scarcity of known RNA structures. In this context, we introduce RNAsmol, a novel sequence-based deep learning framework that incorporates data perturbation with augmentation, graph-based molecular feature representation and attention-based feature fusion modules to predict RNA-small molecule interactions. RNAsmol employs perturbation strategies to balance the bias between true negative and unknown interaction space thereby elucidating the intrinsic binding patterns between RNA and small molecules. The resulting model demonstrates accurate predictions of the binding between RNA and small molecules, outperforming other methods with average improvements of ~8% (AUROC) in 10-fold cross-validation, ~16% (AUROC) in cold evaluation (on unseen datasets), and ~30% (ranking score) in decoy evaluation. Moreover, we use case studies to validate molecular binding hotspots in the prediction of RNAsmol, proving the model\'s interpretability. In particular, we demonstrate that RNAsmol, without requiring structural input, can generate reliable predictions and be adapted to many RNA-targeting drug design scenarios.

Follow Us on

0 comments

Add comment