EBEx: an Ensemble-Based Explainable Framework for Gene Calling in Heterogeneous Diseases

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

EBEx: an Ensemble-Based Explainable Framework for Gene Calling in Heterogeneous Diseases

Authors

Pose-Lagoa, I.; Urda-Garcia, B.; Olvera, N.; Sanchez-Valle, J.; Faner, R.; Valencia, A.; Carbonell-Caballero, J.

Abstract

Complex and clinically heterogeneous diseases pose significant challenges for gene prioritisation and patient stratification, as relevant genes often show weak or context-specific signals and transcriptomic datasets are limited in size. These limitations hinder the discovery of robust molecular signatures using traditional case-control approaches and motivate computational pipelines capable of capturing molecular diversity. Here, we present an explainable ensemble-based AI pipeline to prioritise disease-relevant genes from transcriptomic data, using Chronic Obstructive Pulmonary Disease (COPD) as a use case. To retain biologically relevant interactors obscured by molecular heterogeneity, the framework integrates data-driven signals with curated COPD-related gene sets, further expanded through network-based prioritisation and supported by molecular interactions. Gene relevance is evaluated via aggregated explainability scores across multiple classifier configurations to ensure robust candidate selection. The final set comprised <8% of evaluated genes, ~62% arising from network-based expansion, substantially reducing dimensionality while preserving biological heterogeneity. Beyond case-control classification, the approach identified candidate genes and molecular subgroups associated with specific clinical features, capturing patient-level heterogeneity. The prioritised genes recapitulated key disease-related processes, including immune responses and extracellular matrix degradation, and highlighted additional associations like the enrichment of the IL-4 and IL-13 signalling pathway, which is of clinical interest given ongoing biologic developments targeting these axes. Our pipeline outperformed existing methods in discriminating COPD from controls, and the final gene list was validated in independent cohorts. Implemented as a scalable and reusable R package, this framework facilitates the study of molecular heterogeneity in complex diseases like COPD, supporting advances in diagnosis and precision medicine.

Follow Us on

0 comments

Add comment