DisGeneFormer: Precise Disease Gene Prioritization by Integrating Local and Global Graph Attention
DisGeneFormer: Precise Disease Gene Prioritization by Integrating Local and Global Graph Attention
Koeksal, R.; Fritz, A.; Kumar, A.; Schmidts, M.; Tran, V. D.; Backofen, R.
AbstractIdentifying genes associated with human diseases is essential for effective diagnosis and treatment. Experimentally identifying disease-causing genes is time-consuming and expensive. Computational prioritization methods aim to streamline this process by ranking genes based on their likelihood of association with a given disease. However, existing methods often report long ranked lists consisting of thousands of potential disease genes, often containing a high number of false positives. This fails to meet the practical needs of clinicians who require shorter, more precise candidate lists. To address this problem, we introduce DisGeneFormer (DGF), an end-to-end disease-gene prioritization pipeline. Our approach is based on two distinct graph representations, modeling gene and disease relationships, respectively. Each graph is first processed separately by graph attention and then jointly by a transformer module to combine within-graph and cross-graph knowledge through local and global attention. We propose an evaluation pipeline based on the precision of a top K ranked gene list, with K set to clinically feasible values between 5 and 50, relying solely on experimentally verified associations as ground truth. Our evaluation demonstrates that DGF substantially outperforms existing methods. We additionally assessed the influence of the negative data sampling strategy as well as analyses of the effect of graph topology and features on the performance of our model.