ICON: An isoform-aware hierarchical random forest model for cell type classification
ICON: An isoform-aware hierarchical random forest model for cell type classification
Wijewardena, H.; Wu, S.; Schmitz, U.
AbstractSingle-cell RNA sequencing (scRNA-seq) has transformed our ability to resolve cellular heterogeneity across complex biological systems. However, conventional short-read scRNA-seq is inherently limited in its inability to capture full-length transcripts. Isoform profiles, arising from alternative splicing, provide a deeper layer of resolution, enabling finer discrimination of cellular subtypes and dynamic states, particularly in heterogenous tissues. Long-read RNA sequencing technologies enable accurate transcript-level profiling and more comprehensive characterisation of isoform diversity. Despite these advances, existing cell type annotation methods remain largely tailored to gene-level data, thereby limiting fidelity and leaving isoform-level information an untapped reservoir of biological insight. Here, we present a hierarchical random forest (HRF) framework, ICON, for isoform-aware cell classification in scRNA-seq data. By jointly modelling gene- and isoform-level expression the framework captures both abundance and useage patterns, enabling classification beyond gene-level resolution. A two-stage strategy first assigns cell identities using highly variable gene and isoform features, followed by targeted reclassification of ambiguous cells based on relative isoform and gene usage, thereby resolving conflicts that arise from transcriptional heterogeneity. Importantly, ICON provides interpretable outputs by identifying key genes and isoforms that drive cell type discrimination, linking classification to underlying regulatory mechanisms. Benchmarking on long-read scRNA-seq datasets demonstrates consistent improvements over conventional gene-based approaches. With increasing adoption of long-read sequencing, our framework provides a robust, interpretable foundation for isoform-aware cell type annotation, improving resolution and insight.