GeneBag: training a cell foundation model for broad-spectrum cancer diagnosis and prognosis with bulk RNA-seq data

Avatar
Poster
Voices Powered byElevenlabs logo
Connected to paperThis paper is a preprint and has not been certified by peer review

GeneBag: training a cell foundation model for broad-spectrum cancer diagnosis and prognosis with bulk RNA-seq data

Authors

Liang, Y.; Li, D.; Xu, A. G.; Shao, Y.; Tang, K.

Abstract

Numerous Pre-trained cell foundation models (CFM) have been crafted to encapsulate the comprehensive gene-gene interaction network within cells, leveraging extensive single-cell sequencing data. These models have shown promise in various cell biology applications, including cell type annotation, perturbation inference, and cell state embedding, etc. However, their clinical utility, particularly in cancer diagnosis and prognosis, remains an open question. We introduce the GeneBag model, a novel CFM that represents a cell as \"a bag of unordered genes\" with continuous expression values and a full-length gene list. Pre-trained on single-cell data and fine-tuned on bulk RNA-seq datasets, GeneBag achieves superior performance across cancer diagnosis and prognosis scenarios. In a zero-shot learning setting, GeneBag can classify cancer and non-cancer tissues with approximately 96.2% accuracy. With fine-tuning, it can annotate 40 different types of cancers and corresponding normal biopsies with an overall accuracy of ~97.2%. It notably excels in classifying challenging cancers such as bladder (93%) and stomach (90%). Furthermore, GeneBag is capable of cancer staging with 68.5% accuracy and 5-year survival prediction with an AUC of ~80.4%. This study marks the first to demonstrate the potential of CFMs in RNA-based cancer diagnostics and prognostics, indicating a promising avenue for AI-assisted molecular diagnosis.

Follow Us on

0 comments

Add comment