CancerSubminer: an integrated framework for cancer subtyping using supervised and unsupervised learning on DNA methylation profiles

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

CancerSubminer: an integrated framework for cancer subtyping using supervised and unsupervised learning on DNA methylation profiles

Authors

Choi, J. M.; Zhang, L.

Abstract

Human cancer is highly heterogeneous, resulting in variable drug resistance and clinical outcomes. This complexity hinders accurate prognosis prediction and the development of targeted therapies. Molecular subtyping addresses these challenges by grouping cancers into more homogeneous subsets based on molecular characteristics, enabling subtype-specific treatment strategies. Subtyping is crucial for early diagnosis, personalized therapy, and improved survival by capturing differential therapeutic responses. Existing approaches to cancer subtyping fall into supervised and unsupervised categories. Supervised methods, often trained on The Cancer Genome Atlas (TCGA), rely on predefined subtype annotations but face limitations in generalizability and novel subtype discovery. Unsupervised methods, while capable of identifying new subtypes, may overlook widely recognized ones, hindering consistency with established classifications. Multi-omics approaches improve accuracy but are constrained by costs and data collection. We propose CancerSubminer, a hybrid subtyping framework that integrates supervised and unsupervised learning. A subtype classifier is first trained on labeled data, after which clustering is applied to extracted features, with low-confidence samples reassigned to refine subtype boundaries. Model is retrained with the refined subtypes, and adversarial training corrects batch effects and learns domain-invariant features across labeled TCGA and unlabeled external datasets. A subsequent semi-supervised fine-tuning phase aligns subtypes between datasets and designates low-confidence samples as potential novel candidates. CancerSubminer was evaluated on five cancer types, including breast, bladder, brain, kidney, and thyroid cancers, using TCGA methylation data with annotated subtypes and unlabeled datasets from the Gene Expression Omnibus. The framework outperformed state-of-the-art subtyping models (iClusterPlus, iClusterBayes, NEMO) and clustering methods (Spectral, K-means). Kaplan-Meier survival analysis demonstrated significant prognostic separation (p < 0.05) for all cancers, including thyroid cancer where predefined subtypes showed no significance but CancerSubminer-derived subtypes did. These findings highlight CancerSubminer\'s ability to identify distinct prognostic subtypes, mitigate batch effects, and improve prognostic stratification across heterogeneous datasets. CancerSubminer is publicly available at https://github.com/joungmin-choi/CancerSubminer.

Follow Us on

0 comments

Add comment